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Description 

CROSS-REFERENCE TO RELATED APPLICATIONS 

5 This application is a continuationHnisart of United states Serial No. 08/327.522, filed October 21 . 1994. and United 
States Serial Numkser 08/327.687. filed October 24. 1994. each of which is incorporated by reference in its entirety for 
all purposes. 

GOVERNMENT RIGHTS 

10 

Research leading to the invention was funded in part by NIH Grant No. , and the government 

may have certain rights to the invention. 

BACKGROUND OF THE INVENTION 

75 

The relationship between structure and function of macromolecules is of fundamental importance in the understand- 
ing of biological systems. Such relationships are important to urxierstanding. for example, the functions of enzymes, 
structural proteins, and signalling proteins, the ways In which cells communicate with one another, the mechanisms of 
cellular control and metabolic feedback, eta 

20 Genetic information is critical In the continuation of life processes. Life is sut)stantially infbrmationally based, and 
genetic content controls the growth and reproduction of the organism and its complements. Proteins, which are critical 
features of all living systems, are encoded by the genetic materials of the cell. More particularly, the properties of 
enzymes, functional proteins and structural proteins are determined by the sequence of amino adds from which th^ 
are made. As such, it has become very Important to determine the genetic sequences of nucleotides which encode the 

25 enzymes, structural proteins and other effectors of biological functions. In addition to the segments of nucleotides which 
encode polypeptides, there are many nucleotide sequences which are Involved in the control and regulation of gene 
expression. 

The human genome project is an example of a project that is directed toward determining the complete sequence 
of the genome of the human organism. Although such a sequence would not necessarily correspond to tiie sequence 

30 of any specific individual, it will provide significant information as to the general organization and specific sequences 
contained within genomic segments from particular individuals. It will also provide mapping information useful for further 
detailed studies. The need for highly rapid, accurate, and inexpensive sequencing technology is nowhere more apparent 
than in a demanding sequencing project such as this. To conplete the sequencing of a human genome will require the 
determination of approximately 3 x 10^, or 3 billion, base pairs. 

35 The procedures typically used today for sequendng indude the methods descrit>6d in Sanger, et al., Proc, Natl. 
Acad. ScL USA 74:5463-5467 (1977), and Maxam, et al.. Methods in Enzymology 65:499-559 (1980). Ihe Sanger 
method utilizes enzymatic elongation with chain terminating dideoxy nudeotides. The Maxam and Gilbert method uses 
chemical reactions exhibiting spedfidty of reactants to generate nucleotide specific deavages. Both methods, however, 
require a practitioner to perform a large number of complex, manual manipulations. For example, such methods usually 

40 require the isolation of homogeneous DNA fragments, elaborate and tedious preparation of samples, preparation of a 
separating gel, application of samples to the gel, electrophoresing the samples on the gel, woridng up tiie finished gel, 
and analysis of the results of the procedure. 

Alternative techniques have been proposed for sequendng a nudeic add. POT patent Publication No. 92/10588, 
incorporated herein by reference for all purposes, describes one inrproved technique in which the sequence of a labeled, 

45 target nudeic acid Is determined by hybridization to an array of nucleic add probes on a sut)strate. Each probe is located 
at a positionally distinguishable location on the substrate. When the labeled target is exposed to the substrate, it binds 
at locations that contain conplementary nucleotide sequences. Through knowledge of the sequence of the probes at 
the binding locations, one can determine the nudeotide sequence of the target nucleic add. The technique is particulariy 
efficient when very large arrays of nudeic acid probes are utilized. Such arrays can be formed according to the techniques 

50 described in U.S. Patent No. 5,143,854 issued to Pinrung, et al. See also, U.S. application Serial No 07/805,727, botti 
of which are incorporated herein by reference for all purposes. 

When tiie nudeic acid probes are of a length shorter ttian the target, one can employ a reconstruction technique 
to detennine the sequence of the larger target t>ased on affinity data from the shorter probes. See. U.S. Patent No. 
5,202.231 issued to Drmanac, ef a/., and PCT patent Publication No. 89/10977 issued to Southern. One technique for 

55 overcoming this difficulty has been termed sequendng by hybridization or SBH. Assume, for example, tiiat a 12-mer 
target DNA. /.e., 5*-AQCCTAGCTGAA. is ntixed with an array of all octanucleotide probes. If the target binds only to 
those probes having an exactiy complementary nudeotkJe sequence, only five of the 65.536 octamer probes (Ae. 3'- 
TCX3GATCG. CGGATCX^ GGATCGAC. GATCGACT and ATCGACTT) will hybridize to the target Alignment of the 
overiapping sequences from ttie hybrkiizing prot>es reconstructs tiie complement of the original 12-mer target: 
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TCGGATCG 
CGGATCGA 
GGATCGAC 
GATCGACT 
ATCGACTT 
TCGGATCGACTT 



Although such techniques have been quite useful, it would be helpful to have additional methods which can 
effectively discriminate between fully complementary hybrids and those that differ by one or more base pairs. 

In addition to knowing the genetic sequences of the nucleotides which encode the enzymes, structural proteins and 
other effectors of biological functions. It is important to known how such species interact. A number of biochemical 
processes involve the Interaction of some species, e.^., a drug, a peptide or protein, or RNA. with double-stranded DNA. 
For example. protein/DNA binding Interactions are Involved with a number of transcription factors as well as witii tumor 
suppression associated with the p53 protein and the genes contributing to a number of cancer conditions. As such, it 
woukl be advantages to have methods for preparing libraries of diverse double-stranded nudek; acid sequences and 
probes which can be used, for example, in saeening studies for the determination of binding affinity exhibited by binding 
proteins, drugs or RNA. 

Methods of synthesizing desired single stranded DNA sequences are well known to those of skill in the art In 
particular, methods of synthesizing oligonucleotides are found in, for example, Oligonucleotide Synthesis: A Practical 
Approach, Gait, ed., IRL Press, Oxford (1984), incorporated herein by reference in Hs entirety for all purposes. Synthe- 
sizing unlmolecular dout)le-stranded DNA in solution has also been described. See, Durand, et al. Nucleic Adds Res. 
18:6353-6359 (1990) and Thomson, ef a/.. Nucleic Acid Res, 21 :5600-5603 (1993), the disclosures of both being incor- 
porated herein by reference. 

Solid phase synthesis of biologk;al polymers has been evolving since the eariy "Merrif ield" solid phase peptide 
syntiiesis. described in MenifiekJ. J. Am, Chem. Soa 85:2149-2154 (1963). incorporated herein by reference for all 
purposes. Solid-phase synthesis techniques have been provided for the synthesis of several peptkJe sequences on, for 
example, a number of "pins," See, e.g., Geysen, et af., J. Immun, Meth, 102:259-274 (1987), incorporated herein by 
reference ior all purposes. Other solkj-phase techniques involve, for example, synthesis of various peptide sequences 
on different cellulose disks supported in a oolumn. See, Frank and Doring, Tetrahedron 44:6031 -6040 (1 988). incorpo- 
rated herein by reference for all purposes. Still other solid-phase techniques are described in U.S. Patent No. 4.728.502 
issued to Hamill and WO 90/00626 (Seattle, inventor). Unfortunately, each of these technk:|ues produces only a relatively 
low density array of polymers. For exanple, the technk)ue described in Geysen, ef a/, is limited to producing 96 different 
polymers on pins spaced in the dimensions of a standard miaotiter plate. 

Improved methods of fbnning large arrays of oligonucleotides, peptkies and other polymer sequences in a short 
period of time have been devised. Of particular note. Pirrung. ef a/., U.S. Patent No. 5.1 43.854 (see also POT Application 
No. WO 90/15070) and Fodor. ef a/., PCT Publication No. WO 92/10092, all incorporated herein by reference, disclose 
methods of forming vast anays of peptides, oligonudeotktes and other polymer sequences using, for example, light- 
directed synthesis technk^ues. See also, Fodor, et al„ Science, 251 :767-777 (1991). incorporated herein reference 
for all purposes. These procedures are now referred to as VLSIPS^ procedures. 

More particulariy. in the Fodor, ef a/., PCT application, an elegant method is described fbr using a computer-con- 
trolled system to direct a VLSIPS™ procedure. Using this approach, one heterogenous array of polymers is converted, 
through simultaneous coupling at a number of reaction sites, into a different heterogenous anay. See, U.S. Application 
Serial Nos. 07/796,243 and 07/980,523. the disclosures of which are incorporated herein fbr all purposes. 

Although such techniques have been quite useful, it woukJ be advantages to have additional methocte for preparing 
libraries of diverse double-stranded nudeic add sequences and probes which can be used, for example, in screening 
studies for the determination of binding affinity exhibited by binding proteins, drugs or RNA. 

SUMMARY OF THE INVENTION 

In one embodiment, the present invention provides methods of using nudease treatment to improve the quality of 
hybridization signals on high density oligonudeotide arrays. More particularly, in one such metiiod, an array of oligonu- 
deotides is combined with a labelled target nudeic add to form target-oligonudeotide hybrid complexes. Thereafter, the 
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target-oligonucleotide hybrid complexes are treated wHh a nuclease and, in turn, the array of target-oligonudeotide 
complexes are washed to remove non-perfectly complementary target-oligonucleotide hybrid complexes. Following 
nuclease treatment, the target:oligonucleotide hybrid complexes which are perfectly complementary are more readily 
identified. From the location of the labelled targets, the oligonucleotide probes which hybrkiized with the targets can be 

5 identified and, in turn, the sequence of the target nudeic acid can be more readily determined or verified. 

In another embodiment, the present invention provides methods wherein ligation reactions are used to discriminate 
between fully complementary hybrids and those that differ by one or more base pairs. In one such method, an array of 
oligonudeotides is generated on a substrate On the 3' to 5' direction) using any one of the methods described herein. 
Each of the oligonucleotides in the array is shorter in length than the target nucleic add so that when hybridized to the 

10 target nucleic acid, the target nucleic add generally has a 3' overhang. In this emtxxiiment, the target nucleic acid is not 
necessarily labelled. After the array of oligonudeotides has been combined with the target nudeic add to form target- 
oligonudeotide hybrid conrplexes, the target-oligonudeotide hybrid conrpiexes are contacted with a ligase and a labelled, 
ligatable probe or. alternatively, with a pool of labelled, ligatable probes. The ligation reaction of the labelled, ligatat)le 
probes to the 5* end of the oligonudeotide probes on the sut)strate will occur, in the presence of the ligase. only when 

15 the target:oligonucieotide hybrid has formed with connect base-pairing near the 5' end of the oligonucleotide probe and 
where there is a suitable 3' overhang of the target nudeic add to sen^e as a template for hybridization and ligation. After 
the ligation reaction, the substrate is washed (multiple times if necessary) with water at a temperature of about 40''C to 
50^*0 to remove the unt)ound target nudeic add and the labelled, unligated probes. Thereafter, a quantitative fluores- 
cence image of the hybridization pattern is obtained by scanning the substrate with, for example, a confocal microscope. 

20 and lak>elled digonucleotide probes, i.e., the oligonudeotide probes which are perfectly complementary to the target 
nudeic acid, are identified. Using this information, the sequence of the target nudeic acid can be more readily determined 
or verified. 

In a further embodiment, the present invention provides IS>raries of unimolecular, dout)le-stranded oligonucleotides. 
Each member of the library is comprised of a solid support an optional spacer for attaching the double-stranded oligo- 
25 nudeotide to the support and for providing suffident space between the double-stranded oligonudeotide and the solid 
support for sii>sequent binding studies and assays, an oligonucleotide attached to the spacer and further attached to 
a second complementary digonucleotide by means of a flexible linker, such that the two oligonudeotide portions exist 
in a double-stranded configuration. More particularly, the members of the libraries of the present Invention can be rep- 
resented by the formula: 

30 

Y-L^-X^-L2-X2 

in which Y is a solid support, U is a tx>nd or a spacer, I? is a flexible linking group, and and are a pair of comple- 
mentary digonudeotides. In a specific aspect of the invention, the library of different unimolecular, dout)le-stranded 
35 oligonudeotkies can t>e used for screening a sample for a spedes which binds to one or more members of the library. 
In yet another embodiment, the present invention provides a library of different confonnationally-restricted probes 
attached to a solid support is provided. The individual members each have the formula: 

_Xii_z_X^2 

40 

in which X^^ and X^^ are complementary oligonudeotides and Z is a probe having sufffoient length such that X^^ and 
X^^ form a double-stranded digonucleotide portion of the ment>er and thereby restrict the conformations available to 
the prot>e. in a specific aspect of the invention, the library of different conformationally-restricted probes can be used 
for saeening a sample for a spedes which binds to one or more probes in the library. 
45 In yet another enrtfxxliment, the present invention provides libraries of Intemnolecular. doubly-anchored, double- 
stranded oligonudeotides, each member of the library having the fonmla: 



so 




55 



in which Y represents a solid support. X^ and X^ represent a pair of complementary or partially complementary oligo- 
nudeotides. and 0 and each represent a bond or a spacer. Typically. 0 and are the same and are spacers having 
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sufficient length sucli that and can form a dout)le-stranded oligonucleotide. The non-covalent binding which exists 
between X^ and X^ is represented by the dashed line. 

According to yet another aspect of the present invention, methods and devices for the bioelectronic detection of 
duplex fbrmat'on are provided. 

5 According to still another aspect of the invention, an adhesive Is provided which comprises two surfaces of comple- 
mentary oligonucleotides. 

A further understanding of the nature and advantages of the inventions herein may be realized by reference to the 
remaining portions of the specification and the attached drawings. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 illustrates discrimination of non-perfectty complementary targetioligonudeotide hybrids using RNase A. 
FIQ. 2 illustrates discrimination of non-perfectty complementary targetioligonudeotide hybrids using a ligation reac- 
tion. 

IS FIQ. 3 illustrates the light directed synthesis of an array of oligonucleotides on a substrate. 
FIG. 4. illustrates a hybridization procedure which can be used prior to nudease treatment. 
FIG. 5 illustrates probe tiling strategy used to generate the probes. 

FIQ. 6 illustrates the results obtained from hybridization to the substrate witiiout RNase treatment. 
FIQ. 7 illustrates the results obtained from hybridization to the sufc)strate witii RNase treatment. 
20 FIG. 8 illustrates a method for Improving the sequencing of the 5' end of a randomly fragmented target using 2 
ligation reactions. 

FIQ. 9A to 9F illustrate the preparation of a member of a library of surface-bound, unlmdecular double-stranded 
DNA as well as binding studies with receptors having specif icity for either the double stranded DNA portion, a probe 
which is held In a conformationally restricted form by DNA scaffolding, or a bulge or loop region of RNA. 
25 FIQ. 10A to 10F illustrate the preparation of several different types of Intermdecular. doubly-anchored, double- 
stranded oligonucleotides. 

FIG. 1 1 1llustrates the basic tiling strategy. The figure illustrates the relationship between an interrogation position 
(I) and a corresponding nudeotide (n) in the reference sequence, and between a probe from the first probe set and 
corresponding probes from second, third and fourth probe sets. 
30 FIQ. 1 2 Illustrates ttie segment of complementarity in a probe from the first probe set. 

FIG. 13 illustrates the incremental succession of probes in a basic tiling strategy The figure shows four probe sets, 
each having three probes. Note that each probe differs from its predecessor In tiie same set by the acquisition of a 5* 
nudeotide and the loss of a 3* nudeotide. as well as in the nucleotide occupying the interrogation position. 

FIG. 14A illustrates the exemplary arrangement of lanes on a chip. The chip shows four probe sets, each having 
35 five probes and each having a total of five Intenrogation positions (II - 15). one per probe. 

FIG. 14B illustrates a tiling strategy for analyzing closing spaced mutations. 

FIG. 14C Illustrates a tiling strategy for avoiding loss of signal due to probe self-annealing. 

FIQ. 15 illustrates a hybridization pattem of chip having probes laid down in lanes. Dark patches indicate hybridi- 
zation. The probes in tiie lower part of the figure occur at tiie cdumn of tiie array Indicated by ttie arrow when tiie probes 
40 lengtii is 1 5 and the Interrogation position 7. 

FIG. 1 6 Illustrates the UocktiHng strategy The perfectiy matched probe has ttiree interrogation positions. The probes 
from the ottier probe sets have only one of tiiese interrogation positions. 

FIG. 17A to 17C illustrate methods which can be used to prepare single-stranded nudeic acid sequences. 

45 DETAILED DESCRIPTION OF THE INVEfMTION AND PREFERRED EMBODIMENTS 

TABLE OF CONTENTS 

I. Glossary 
so II. General Overview 

III. Methods For Generating An Array Of Oligonucleotides On A Substrate 

IV. Sequendng By Hybridization Using the Probe Tiling Strategy 
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VI. Detedion Methods 
55 VII. Applications 

VIII. Libraries of Untmolecular. Double-Stranded Oligonudeotides 
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X. Libraries of Intermolecular, Doubly-Anchored, Double-Stranded Or^onudeotides 

XI. Methods of Library Screening 
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XII. Bioelectric Devices and Methods 
Xlil. Alternative Embodiments 

XIV. Examples 

XV. Conclusion 

5 

I. Glossary 

The following temis are intended to have the following general meanings as they are used herein: 

10 1. Substrate : A material having a rigid or semi-rigid surface. In many embodiments, at least one surface of the 
substrate will be substantially fiat, although in some embodiments it may be desirable to physically separate syn- 
thesis regions for different polymers with, for example, wells, raised regions, etched trenches, or the like. In some 
embodiments, the substrate itself contains wells, trenches, flow through regions, etc, which form all or part of the 
synthesis regions. According to other embodiments, small beads may be provided on the surface, and compounds 

IS synthesized thereon may be released upon completion of the synthesis. 

2. Predefined Region : A predefined region is a localized area on a substrate which is, was, or is intended to be used 
for formation of a selected polymer and is otherwise referred to herein in the alternative as "reaction" region, a 
"selected" region, or simply a "region." The predefined region may have any convenient shape. e.g.. circular, rec- 

20 tangular. elliptical, wedge-shaped, eta In some embodiments, a predefined region and, therefore, the area upon 
which each distinct polymer sequence Is synthesized is smaller than about 1 cm^, more preferal>ly less than 1 mm^, 
and still more preferably less tiian 0.5 mm^. In most preferred embodirnents* the regions have an area less than 
about 10,000 \ivr? or, more preferably, less than 100 )irr?. Wittiin tiiese regions, the polymer synthesized therein is 
preferably synthesized in a substantially pure form. 

25 

3. Substantially Pure : A polymer or other compound is considered to be "substantially pure" when it exhibits char- 
acteristics that distinguish it from the polymers or compounds in other regions. Fbr example, purity can be measured 
in terms of the activity or concentration of the compound of interest. Preferably the compound in a region is suff icientiy 
pure such that it is the predominant species in the region. According to certain aspects of tiie invention, the compound 

30 is 5% pure, more preferably more than 10% pure, and most preferably more than 20% pure. According to more 
preferred aspects of the invention, the compound is greater than 80% pure, preferably more than 90% pure, and 
more preferably more tiian 95% pure, where purity fa this purpose refers to the ratio of the nunrt)er of compound 
molecules formed in a region having a desired structore to tiie total nunrt>er of non-solvent molecules in the region. 

3S 4. Monomer : In general, a monomer is any member of the set of molecules which can be joined together to form 
an oligomer or polymer. The set of monomers useful in the present invention includes, but is not restricted to, for 
the example of oligonucleotide synthesis, the set of nucleotides consisting of adenine, thymine, cytoslne, guanine, 
and uridine (A. T, C. G. and U. respectively) and synthetic analogs thereof. As used herein, monomers refers to any 
member of a basis set for synttiesis of an oligomer. Different basis sets of monomers may be used at successive 

40 steps In tiie synttiesis of a polymer. 

5. Oligomer or Polymer : The oligomer or polymer sequences of the present invention are formed from the chentical 
or enzymatic addition of monomer sut)units. Such oligomers include, for example, lx>th linear, cyclic, and branched 
polymers of nucleic acids, polysaccharides, phospholipids, and peptides having either a-, p-. or oo-amino acids. 

45 heteropolymers in which a kncwn drug is covalently kXHind to any of the akxive. polyurethanes, polyesters, polycar- 
bonates, pdyureas. polyamides. pdyethyleneimines. polyarylene sulfides, polystloxanes, polyimides, polyacetates, 
or other polymers which will be readily apparent to one skilled in the art upon review of this disclosure. As used 
herein, the term oligomer or polymer is meant to include such molecules as p-turn mimetics, prostaglandins and 
benzodiazepines which can also be synthesized in a stepwise fashion on a solid support 

so 

6. Peptide : A peptide is an oligomer in which tiie monomers are amino adds and which are joined together ttirough 
amide bonds and alternatively referred to as a polypeptide. In the context of tiiis specification it should be appreciated 
that when a-amino acids are used, they may be tiie L-optical isomer or tiie D-optical isomer. Other amino acids 
which are useful in the present invention include unnatural amino acids such as p-alanine, phenylglycine, 

ss homoarginine and the like. Peptides are more than two amino acid monomers long, and often more than 20 amino 
add monomers long. Standard abbreviations for amino adds are used (e.g.. P for proline). These abbreviations are 
induded in Stryer, Biochemistry, Third Ed.. (1988), which is incorporated herein by reference for all purposes. 
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7. Oligonucleotides : An oligonudeolide is a single-stranded DMA or RNA niolecule. typically prepared by synthetic 
means. Alternatively, naturally occuning oligonucleotides, or fragments thereof, may be isolated from their natural 
sources or purchased from commercial sources. Those oligonucleotides employed in the present invention will be 
4 tol 00 nucleotides in length, preferably from 6 to 30 nucleotides, although oligonucleotides of different length may 

5 be appropriate. Suitable oligonucleotides may be prepared by the phosphoramidite method described by Beaucage 
and Carruthers, Tetrahedron Lett., 22:1859-1862 (1981), or by the triester method according to Matteucd, ef a/., J. 
Am. Chiem. Soc. , 1 03:31 85 (1 981 ), both incorporated herein by reference, or by other chemical methocte using either 
a commercial automated oligonucleotide synthesizer or VLSIPS^ technology (discussed in detail below). When 
oligonudectides are referred to as "double-stranded," it is understood by those of skill in the art that a pair of oligo- 

fo nucleotides exist in a hydrogen-t>onded. helical array typically associated with, for example, DNA. In addition to the 
100% complementary form of double-stranded oligonucleotides, the term "double-strarxjed" as used herein is also 
meant to refer to those forms which include such structural features as bulges and loops, described more fully In 
such biochemistry texts as Stryer. Biochemistfy, Third Ed., (1988), previously incorporated herein by reference for 
all purposes. 

16 

8. Chemical terms : As used herein, the term "alkyl" refers to a saturated hydrocarbon radical which may be straight- 
chain or branched-chain (for example, ethyl, isopropyt, f-amyl, or 2,5-dimethylhexyl). When "all^" or "alkylene" is 
used to refer to a linking group or a spacer, it is taken to be a group having two available valences for covalent 
attachment, for example. — CH2CH2— , — CH2CH2CH2— . — CH2CH2CH(CH3)CH2--and --CH2(CH2CH2)2CH2— . 

20 Preferred alkyl groups as substituents are those containing 1 to 1 0 cartx)n atoms, with those containing 1 to 6 cart)on 
atoms being particularly preferred. Preferred alkyl or alkylene groups as linking groups are those containing 1 to 20 
cart)on atoms, with those containing 3 to 6 caftx3n atoms being particularly preferred. The temn "polyethylene glycol" 
is used to refer to those molecules which have r^>eating units of ethylene glycol, for example, hexaethylene glycol 
(HO— (CH2CH20)5— CH2CH2OH). When the term "polyethylene glycol" is used to refer to linking groups and spacer 

25 groups, it woukJ be understood by one of skill in the art that other potyethers or pdyols could be used as well (/.e, 
polypropylene glycol or mixtures of ethylene and propylene glycols). The following abbreviations are used herein: 
phi, phenanthrenequinone diimine: phen'. 5*amido-glutarlc acid-I.IO-phenanthroline; dppz, dipyridophenazine. 

9. Protective Group : As used herein, the term "protecting group" refers to any of the groups which are designed to 
30 block one reactive site in a mol ecule whil e a chemrcal reaction is carried out at another reactive site. More particularly, 

the protecting groups used herein can be any of tiiose groups described in Greene, et al.. Protective Groups In 
Organic Chemistry, 2nd Ed. , John Wiley & Sons. New York, NY, 1 991 , incorporated herein by reference. The proper 
selection of protecting groups for a particular synthesis will be governed by the overall methods employed in the 
synthesis. For example, in light-directed" syrrthesis. discussed below, the protecting groups will be photolabile pro- 
as tecting groups such as NVOC. MeNPOC. and ttiose disclosed in co-pending Application PCTAJS93/10162 (filed 
October 22. 1 993). incorporated herein by reference. In other methods, protecting groups way be removed by chem- 
ical methods and Include groups such bs FMOC. DMT and others known to those of skill in the art 

1 0. Comolementarv or sut>stantially complementary : Refers to the hybridization or base pairing between nucleotides 
40 or nucleic acids, such as, for instance, between tiie two strands of a double stranded DNA molecule, or between 

an oligonucleotide primer and a primer binding site on a single stranded nucleic add to be sequenced or amplified. 
Complementary nucleotides are. generally, A and T (a A and U). or C and G. Two single stranded RNA or DNA 
molecules are said to be substantially complementary when tiie nucleotides of one strand, optimally aligned and 
compared and with appropriate nucleotide insertions or deletions, pair with at least about 80% of the nucleotides 
45 Of the other strand, usually at least about 90% to 9S%, and more preferably from about 98 to 100%. Alternatively, 
substantial complementarity exists when an RNA or DNA strand will hybridize under selective hybridization condi- 
tions to its corrplement. Typically, selective hybridization will occurwhen there is at least about 65% complementarity 
over a sti'etch of at least 14 to 25 nucleotides, preferat)ly at least about 75%. more preferably at least about 90% 
complementarity. See. M. Kanehisa Nucleic Acids Res. 12:203 (1984), incorporated herein by reference. 

so 

1 1 . Strinoent hybridization conditions : Such conditions will typically include salt concentrations of less than about 
1 M. more usually less than about 500 mM, and preferably less than about 200 mM. Hybridization temperatures 
can be as low as S^'C, but are typically greater than 22''C. more typically greater than about ZO^'O, and preferably in 
excess of about 37''C. Longer fragments may require higher hybridization temperatures for specific i^ridization. 

55 As Other factors may dramatically affect the stringency of hybridization, including base composition, lengtti of the 
complementary strands, presence of organic solvents and extent of base mismatching, tiie combination of param- 
eters is more important ttian the ak)solute measure of any one alone. 
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12. Epitope : The portion of an antigen molecule which is delineated by the area of interaction with the subclass of 
receptors known as antibodies. 

13. Identifier tag : A means whereby one can identify which molecules have experienced a particular reaction in the 
synthesis of an oligomer. The identifier tag also records the step in the synthesis series in which the molecules 
experienced that particular nK>nomer reaction. The identifier tag may be any recognizable feature which is. for exam- 
ple: microscopically distinguishable in shape, size, color, optical density, etc.; differently absorbing or emitting of 
light; chemically reactive; magnetically or electronically encoded; or in some other way distinctively marked witti tiie 
required infbrmatioa A preferred example of such an identifier tag is an oligonucleotide sequence. 

1 4. Ligand/Probe : A ligand is a molecule that is recognized by a particular receptor. The agent bound by or reacting 
with a receptor is called a Uganda a term which is definttionally meaningful only in terms of its counterpart receptor. 
The term "ligand" does not imply any particular nmlecular size or other structural or compositional feature other than 
that the substance in question is capable of binding or othenwise interacting with ttie receptor. Also, a ligand may 
serve either as the natural ligand to which the rec^tor binds, or as a functional analogue that may act as an agonist 
or antagonist Examples of ligands that can be investigated by this invention include, but are not resb'icted to, agonists 
cmd antagonists for cell membrane receptors, toxins and venonrs, viral epitopes, hormones (e.g., opiates. sterokJs, 
etc.). hormone receptors, peptides, enzymes, enzyme substrates, substrate analogs, transition state analogs, ctifac- 
tors, drugs, proteins, and antitxxlies. The term "probe" refers to ttiose molecules which are expected to act like 
ligands but for which binding information is typically unknown. For example. If a receptor is known to bind a ligand 
which is a peptide p-tum. a "probe" or library of probes will be those molecules designed to mimic the peptide fi- 
turn. In instances where the particular ligand associated witii a given receptor is unknown, tiie term probe refers to 
those molecules designed as potential ligands for tiie receptor. 

15. Receptor : A molecule that has an affinity for a given ligand or probe. Receptors may be naturally-occurring or 
manmade molecules. Also, they can be emptoyed In ttieir unaltered natural or isolated state or as aggregates witii 
ottier species. Receptors may be attached, covalentiy or nonoovalentiy. to a binding member, e'rther directiy or via 
a specific binding substance. Examples of receptors which can be employed by tiiis invention include, but are not 
restricted to. antibodies, cell membrane receptors, monoclonal antibodies and antisera reactive with specific anti- 
genic determinants (such as on viruses, cells or other materials), drugs, polynucleotides, nucleic acids, peptides, 
cofactors. lectins, sugars, polysaccharides, cells, cellular membranes, and aganelles. Receptors are sometimes 
refen^ed to in the art as anti-ligands. As the term receptors is used herein, no difference in meaning is intended. A 
"ligand-receptor pair" is formed when two nxslecules have combined through molecular recognition to form a com- 
plex. Ottier examples of receptors which can be investigated by this invention include but are not restricted to: 

a) Microorganism receptors : Determination of ligands or probes that bind to receptors, such as specific transport 
proteins or enzymes essenti'al to survival of miaoorganisms, is useful in a new class of antit)lotics. Of particular 
value wouM be antibiotics against opportunistic fungi, protozoa, and those bacteria resistant to the antibtotics 
in current use. 

b) Enzymes : For instance, tiie binding site of enzymes such as the enzymes responsible for cleaving neuro- 
transmitters. Determination of ligands or probes that bind to certain receptors, and thus modulate the action of 
the enzymes that deave the different neurotransmitters, is useful in the development of drugs that can be used 
in ttie treatinent of disorders of neurotransmission. 

c) Antibodies : For instance, the invention may be useful in investigating the llgand-blnding site on the antibody 
nxDlecule which combines with tiie epitope of an antigen of interest. Determining a sequence that mimks an 
antigenic epitope may lead to tiie development of vaccines of which tiie immunogen is based on one or more 
of such sequences, or lead to the development of related diagnostic agents or compounds useful in ttierapeutic 
treatments such as for autoimmune diseases (e.g.. by blocking the binding of the "self" antitxxlies). 

d) Nucleic Acids : The invention may be useful in investigating sequences of nucleic acids acting as binding sites 
for cellular proteins ("tsos-acting factors"). Such sequences may include. e.g., transcription fcictors, suppres- 
sors, enhancers or promoter sequences. 

e) Catalytic Polypeptides : Polymers, preferably polypeptides, which are capable of promoting a chemical reac- 
tion involving the conversion of one or more reactants to one or more products. Such polypeptides generally 
include a binding site specific for at least one reactant or reaction intermediate and an active functionality prox- 
imate to tiie binding site, which functionality is capable of chemically nfKxIifying the bound reactant. Catalytic 
polypeptides are desaibed in, Lerner, R.A. et al. Science 252: 659 (1991), which is incorporated herein by 
reference. 

f) Hormone receptors : For instance, the receptors for insulin and growth hormone. Determination of the ligands 
which bind with high affinity to a receptor is useful in the development of. for example, an oral replacement of 
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the daily injections which diathetics niust take to relieve the symptoms of diabetes, and in the other ease, a 
replacement for the scarce human growth hormone that can only be ot)tained from cadavers or by recombinant 
DNA technology. Other examples are the vasoconstrictive hormone receptors; determination of those tigands 
that bind to a receptor may lead to the development of drugs to control blood pressure. 
5 g) Opiate receptors : Determination of ligands that bind to the opiate receptors in the brain is useful in the devel- 

opment of less-addictive replacements for morphine and related drugs. 

1 6. Svnthetic : Produced by in vitro chemical or enzymatic synthesis. The synthetic libraries of the present invention 
may be contrasted with those in viral or plasmid vectors, for instance, which may be propagated In bacterial, yeast, 

10 or other living hosts. 

17. Probe : A molecule of known composition or monomer sequence, typically formed on a solid surface, which is 
or may be exposed to a target molecule and examined to deterrrane if the probe has hybridized to the target. Also 
referred to herein as an "oligonucleotide" or an "oligonucleotide probe." 

IS 

1 8. lacsfit: A molecule, typically of unknown composition or monomer sequence, for which it is desired to study the 
composition or nranomer sequence. A target may be a part of a larger molecule, such as a few k>ases in a longer 
nucleic add. 

20 1 9. A. T. C. G. U : A, T. C, G, and U are abbreviations for ttie nucleotides adenine, tiiymine. cytosine, guanine, and 
uridine, respectively. 

20. Array or Chip or Ubrary : A collection of oligonucleotide probes of predefined nucleotide sequence, often formed 
In one or more substrates, which are used in hybridization studies of target nucleic adds. 

25 

11. General Overview 

In one embodiment the present invention provides irrproved methods tor obtaining sequence information atx>ut 
nudeic adds (/' e.. oligonucleotides). More particularly, the present invention provides improved methods for discrimi- 

30 nating between fully complementary hybrids and tiiose tiiat differ by one or more base pairs. The mettKxJs of ttie present 
invention rely, in part, on the ability to syrrthesize or attach specific oiigonudeotides at known locations on a substrate, 
typically a single substrate. Such oiigonudeotides are capable of interacting wrtti specific target nudeic add while 
attached to tiie sut>strate. By appropriate labeling of ttiese targets, the sites of the interactions between the target and 
the specific oligonudeotide can be derived. Moreover, because the oiigonudeotides are posittonally defined, tiie target 

35 sequence can be reconstructed from the sites of the interactions. 

It has now been determined that reconstruction of tiie target sequence can be improved by using various enzymes 
that catalyze digonudeotide deavage and ligation reactions. More particularly, it has been determined thatdiscrimination 
between fully oomplementary hybrids and those that differ by one or more base pairs can be greatiy enhanced by using 
various enzymes that catalyze oligonudeotide cleavage and ligation reactions. 

40 RNase A treatment, for example, can be used to improve tiie quality of RNA hybridization signals on high density 
oligonudeotide arrays. After ttie array of oiigonudeotides has been combined witti a target nucleic acid (RNA) to form 
target-oligonudeotide hybrid complexes, the target-oligonudeotide hybrid complexes are treated with RNase A to remove 
non-perfectly complementary target-oligonudeotide hybrid com^^exes. RNase A recognizes and cuts single-stranded 
RNA, induding RNA in RNA:DNA hybrids tiiat is not in a perfect double-stranded structure. As illustrated in FIG. 1 , RNA 

45 bulges, loo(^, and even single base mismatches can be recognized and deaved by RNase A. Similarly, treatment with 
other nucleases (e.^., S1 nudease and Mung Bean nudease) can be used to improve the DfsIA hybridization signals on 
high density oligonucleotide arrays. As such, nudease treatment can be used to improve the quality of hybridization 
signals on high density oligonucleotide an-ays and. in turn, to more accurately determine the sequence, or nrmitor 
mutations, or resequence ttie target nudeic add. 

so Moreover, ligation reactions can be used to discriminate between fully complementary hybrids and tiiose that differ 
by one or more base pairs. T4 DNA ligase. fa exanple. can be used to identify DNA:DNA hybrids that are perfectiy 
complementary near the 5' end of ttie immobilized oligonudeotide probes. The ligation reaction of labelled, short oiigo- 
nudeotides to the 5' end of oligonucleotide probes on a sut)Strate will occur, in the presence of a ligase, only when a 
target:digonucleotide hybrid has formed witti connect base-pairing near the 5' end of tiie oligonucleotide prcbe and where 

55 there is a suitak>le 3* overhang of the target to serve as a template for hybridization and ligation. As such, after the array 
of oligonucleotides has been combined with a target nudeic add to form target-oligonudeotide hybrid complexes, ttie 
target-oligonucleotide hybrid complexes can be contacted with a ligase and a labelled, ligatable oligonucleotide probe. 
After the ligation reaction, the sut>strate is washed to remove ttie target nucleic acid and labelled, unligated oligonude- 
otide probes. The oligonudeotide probes containing ttie lat>el incficate sequences which are perfectiy complementary 
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to target nudeic acid sequence. As such, as illustrated in FIG. 2. ligation reactions can be used to intprove discrimination 
of base-pair mismatches near the 5' end of the probe, nvsmatches that are often poorly discriminated following hybrid- 
ization alone. 

In addition to providing improved methods for discriminating between fully complementary hybrids and those that 
5 differ by one or more base pairs, the present invention provides methods for the preparation of high<iensrty arrays of 
diverse unimolecuiar and intramolecular double-stranded oligonucleotides, as well as arrays of conformationally 
restricted probes. The broad concept of such arrays is illustrated in FIG. 9. FIGS. 9A, 9B and 9C illustrate the preparation 
of surface-bound unimolecuiar double stranded DNA. while FIGS. 9D. 9E and 9F illustrate uses for the libraries of the 
present invention. 

10 FIG. 9A shows a solid support 1 having an attached spacer 2. which is optional. Attached to the distal end of the 
spacer is a first oligomer 3. which can be attached as a single unit or synthesized on the support or spacer in a monomer 
by monomer approach. FIG. 9B shows a sutisequent stage in the preparation of one member of a library according to 
the present invention. In this stage, af lexible linker 4 Is attached to tiie distal end of the oligomer 3. In other embodiments, 
the flexible linker will be a probe. FIG. 9C shows the completed surlace-bound unimolecuiar double stranded DNA which 

15 is one member of a library, wherein a second oligomer 5 Is now attached to the distal end of the flexible linker (or probe). 
As shown in FIG. 9C. the length of the flexible tinker (or probe) 4 is suffknent such that the first and second oligomers 
(which are complementary) exist in a double-stranded conformation. It will be appreciated by one of skill in the art that 
the libraries of tiie present invention will contain multiple, individually synttiesized members which can be saeened for 
various types of activity. Three such binding events are illustrated in FIGS. 9D, 9E and 9F. 

20 In FIG. 9D, a receptor 6, which can be a protein. RNA molecule or other molecule which is known to bind to DNA. 
is introduced to the library. Determining which member of a library binds to the receptor provides information which is 
useful fbr diagnosing diseases, sequencing DNA or RNA. identifying dmgs and/6r proteins that bind DNA. Mentifying 
genetic characteristics, or in other drug discovery endeavors. 

In FIG. 9E. the linker 4 is a probe for which binding information is sought. The probe is held in a conformationally 

25 restricted manner by the flanking oligomers 3 and 5. which are present in a double-stranded conformation. As a result, 
a library of conformationally restricted probes can be screened for binding activity with a receptor 7 which has specif k% 
fbr the probe. 

The present invention also contemplates the preparation of libraries of unimolecuiar, dout>le-stranded oligonucle- 
otides having bulges or loops in one of the strands as depk;ted in FIG. 9F. In FIG. 9F, one oligonucleotide 5 is shown as 

30 having a bulge 8. Specific RNA bulges are often recognized l3y proteins (e.g., TAR RNA is recognized by the TAT protein 
of HIV). Accordingly, libraries of RNA bulges or \oops are useful In a number of diagnostic applications. One of skill in 
the art will appreciate that the bulge or k>op can be present in either oligonucleotide portion 3 or 5. 

In another embodiment, the present invention provides Ibrariesof internrxslecular, doubly-anchored, double-stranded 
oligonucleotides. The broad concept of tiiis aspect of the invention is Illustrated in FIG. 10. As with the above descrbed 

35 "unimolecuiar" aspect of the invention. FIG. 10A shows a solid support 11 having an attached spacer 12. which is 
optional. Attached to the distal end of the spacer Is a first oligomer 1 3. which can be attached as a single unit or syn- 
thesized on the support or spacer in a monomer by monomer approach. FIG. 10B shows a subsequent stage in tiie 
preparation of one member of a library according to the present invention. In this stage, a second oligomer 14 which is 
complementary to tiie first oligomer 13, is attached to the solkJ support The second oligomer can also be attached as 

40 a single unit or synthesized on tiie support or spacer in a monomer by monomer approach. Typically, tiie first and second 
oligomers are syntiiesized on tiie solid support in a protected form. Removal of tiie protecting groups provides a solid 
support with complementary oligomers in dose proximity which can form a completed intermolecular, doubly-anchored, 
double stranded oligonucleotide (FIG. IOC). FIG. 10D shows one member of a library in which the first self-complemen- 
tary oligomer is 3'-AAAAAi 1 1 1 1-5* and its identical neighboring oligomer is 3'-i 1 1 1 iAAAAA-5*. In other embodiments 

45 of this aspect of the Invention, the complementary oligomers will exhibit complementarity only over their respective 
termini, as shown in FIG. 10E. It will be appreciated by one of skill in the art, that the libraries of the present invention 
will contain multiple, individually synthesized members which can be screened for various types of activity or which can 
sen^e as templates for hybrkJization enhancement 

so III Methods For Generating An Array Of Oligonucleotides On A Substrate 

A The Substrate 

In tiie methods of tiie present invention, an array of diverse otigonudeotkles at known locations on a single 8ut)strate 
55 suriace is employed. Essentially, any conceivable substrate can be employed in the invention. The substrate can be 
organic, inorganic, biological, nonbiotogical. or a combination of any of these, existirtg as beads, particles, strands, 
precipitates, gels, sheets, tubing, spheres, containers, capillaries, pads, slices, films, plates, slides, eta The sut>strate 
can have any convenient shape, such a disc, square, sphere, circle, etc. The sutsstrate is preferak)ly fiat, but may take 
on a variety of alternative surface configurations. For example, the sutjstrate may contain raised or depressed regions 
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on which the synthesis takes place. The substrate and its surface preferably form a rigid support on which to can7 out 
the reaction described herein. The substrate and its surface may also chosen to provide appropriate light-absorbing 
characteristics. The substrate may be any of a wide variety of materials including, for example, polymers, plastics, pyrex. 
quartz, resins, silicon, silica or silica-based materials, carbon, metals, inorganic glasses, inorganic crystals, membranes. 

5 eta More particularly, the substrate may. for instance, be a polymerized Langmuir Blodgettfilm. functionalized glass. Si, 
Ge GaAs. GaP, Si02, SiN4. modified silicon, or any one of a wide variety of gels or polymers such as (poly)-tetraftuor- 
otheylene. (poly)vinyiidenedrf luoride. polystyrene, polycartxinate. or combinations thereof. Other sut)Strate materials will 
be readily apparent to tiiose of skill in the art upon review of this disclosure. In a preferred embodiment tiie sut>strate is 
flat glass or single-crystal silicon with surfoce relief features of less than 10. 

10 In some embodiments, a predefined region on the substrate arxj, therefore, the area upon which each distinct mate- 
rial is synthesized will have a suriace area of between about 1 cm^ and 10'^^cm^. In some embodiments, the regions 
have areas of less than about lO'^cm^, lO'^cm^. IQ-^cm^. I0"^cm2, lo W, IQ-^cnn^, I0"^cm2, Itr^cm^. or lO" ^^^cm^. 
In a prefen-ed embodiment the regions are between about 10X10 iim and 500x1 OQitm. 

Moreover, in some embodiments, a single substrate supports more than about 10 different monomer sequences 

75 and preferat3ly more than about 100 different monomer sequences, although In some embodiments more than akx)ut 
1 0^, 1 0^, 1 0^, 1 0^. 1 0^. or 1 0® different sequences are provided on a substirate. Of course, witiiin a region of the substrate 
in which a monomer sequence is synthesized, it is preferred tiiat the monomer sequence be substantially pure. In some 
embodiments, regions of the substrate contain polymer sequences which are at least about 1%. 5%, 10%, 15%, 20%, 
25%, 30%, 35%, 40%, 45%, 50%. 60%. 70%, 80% 90%, 95%. 96%. 97%, 98%, or 99% pure. 

20 As previously explained, ttie substrate is preferably flat, but may take on a variety of alternative surface configura- 
tions. Regardless of the configuration of tiie substrate surface, it is imperative that the reactants used to generate an 
arngy of oligonucleotides in the individual reaction regions be prevented from moving to adjacent reaction regbns. Most 
simply, tills is ensured k>y chemically attaching the oligonucleotides to the substrate. Moreover, this can be ensured t>y 
provkjing an appropriate terrier between the various reaction regions on the sut)Stiate. A mechanical device or physical 

25 structure can be used to define the various regions on the sut}Strate. For example, a wall or otiier physical barrier can 
be used to prevent the reactants in the individual reaction regions from moving to adjacent reaction regions. Alternatively, 
a dimple or other recess can be used to prevent the reactant components In the indivklual reaction regions from nx)ving 
to adjacent reaction regions. 

30 6L Generating An Array Using Llght-Dliected Methods 

An an-ay of diverse oligonucleotides diverse oligonucleotides at known locations on a single substrate surfaces can 
be formed using a variety of techniques known to those skilled in the art of polymer synthesis on sold supports. For 
exarrple. "light directed" methods (which are one technique in a family of methods known as VLSIPS^ methods) are 

35 desaibed in U.S. Patent No. 5.143,854. previously incorporated by reference. The light directed methods discussed in 
the '854 patent involve activating predefined regions of a substrate or solid support and then contacting tiie substrate 
with a preselected monomer solution. The predefined regions can be activated with a light source shewn through a mask 
(much in tiie manner of photolithography techniques used in integrated circuit ^rication). Otiier regksns of the substrate 
remain inactive because ttiey are blocked by tiie mask from illumination and remain chemically protected. Thus, a light 

40 pattern defines which regions of the sut)strate react witfi a given monomer. By repeatedly activating different sets of 
predefined regions and contacting different monomer solutions with the substrate, a diverse array of polymers is produced 
on tiie sul3Strate. Of course, ether steps such as washing unreacted monomer solution from tiie substrate can be used 
as necessary Otiier techniques include mechanical techniques such as those described in PCT Na 92/10183, USSN 
07/796,243, also incorporated herein by reference for all purposes. Still further techniques include bead t>ased tech- 

45 niques such as those described in PCT US/93yD4145. also incorporated herein by reference, and pin based metiiods 
such as those desaibed in U.S. Pat. No. 5,288.51 4, also incorporated herein by reference. 

The VLSIPS^ methods are preferred for generating an array of oligonucleotides on a single sut>strate. The surface 
of the solid support or substrate can be optionally modified with spacers having photolabile protecting groups such as 
NVOCand MeNPOC, is illuminated through a photolithographic masK yieMing reactive groups (typically hydroxyl groups) 

so in the illunm'nated regions. A 3'-0-phosphoramktite activated deoxynucleoside (protected at the 5'-hydroxyl with a pho- 
tolabile protecting group) is then presented to tiie surface and chemical coupling occurs at sites ttiat were exposed to 
light Following capping, and oxidation, the sut)Strate is rinsed and the surface illuminated through a second mask, to 
expose additional hydroxyl groups for coupling. A second 5*-protected. 3'-0-phosphoramidite activated deoxynudeoskJe 
is presented to tiie surface. The selective photodeprotection and coupling cycles are repeated until tiie desired set of 

55 oligonucleotides is produced. 
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a Generating An Array Of Oligonucleotides Using Flow Ciiannel Or Spotting Methods 

In addition to the foregoing, additional methods which can be used to generate an array of oligonudeotkies on a 
single substrate are described in co-pending Applications Ser. No. 07/980.523. filed Hwetrber 20. 1 992. and 07/796.243, 

5 filed November 22, 1 991 . incorporated herein by reference tor all purposes. In the methods disclosed in these applica- 
tions, reagents are delivered to the substrate by either (1) flowing within a channel defined on predefined regions or (2) 
"spotting" on predefined regions. However, other approaches, as well as combinations of spotting and flowing, may be 
enployed. In each instance, certain activated regions of the substrate are mechanically separated from other regions 
when the monomer solutions are delivered to the various reaction s'rtes. 

10 A typical "flow channel" method applied to the compounds and libraries of the present invention can generally be 
described as follows. Diverse polymer sequences are synthesized at selected regions of a substrate or solid support by 
fbrming flow channels on a surface of the substrate through which appropriate reagents flew or in which appropriate 
reagents are placed. For example, assume a monomer "A" is to be bound to the substrate in a first group of selected 
regions, if necessary, all or part of the surface of the substrate in all or a part of the selected regions is activated for 

IS binding by. for exanple. flowing appropriate reagents through all or some of the channels, or by washing the entire 
substrate with appropriate reagents. After placement of a channel blod^ on the surface of the substrate, a reagent having 
the monomer A flows through or is placed in all or some of the channel(8). The channels provide fluid contact to the first 
selected regions, thereby binding the monomer A on the substrate directly or indirectly (via a spacer) in the first selected 
regions. 

20 Thereafter, a monomer B is coupled to second selected regions, some of which may be included among the first 
selected regions. The second selected regions will be in fluid contact with a second flow channel(s) through translation, 
rotation, or replacement of the channel block on the surface of the substrate; through opening or closing a selected 
valve; or through deposition of a layer of chemical or photoresist. If necessary, a step is perfamed for activating at least 
the second regions. Thereafter, the monomer B is flowed through or placed in the second flow channel(s), binding 

25 monomer B at tiie second selected locations. In tiiis particular example, the resulting sequences bound to the substrate 
at this stage of processing will be. for example, A. B. and AB. The process is repeated to form a vast anay of sequences 
of desired length at known k>cations on the substrate. 

After the substrate is activated, monomer A can be flowed through some of the channels, monomer B can be flowed 
through other channels, a monomer C can be flowed through still other channels, etc. In tfiis manner, many or all of ttie 

30 reaction regions are reacted witti a monomer before ttie channel block must be moved or the substrate must be washed 
and/or reactivated. By making use of many or all of the available reaction regions simultaneously, the number of washing 
and activation steps can be nrunimized. 

One of skill in the art will recognize that there are alternative methods of fbrming channels or othenwise protecting 
a portion of tiie suriace of tiie substrate. For example, according to some entKxJiments. a protective coating such as a 

35 hydrophilic or hydrophobic coating (depending upon the nature of tiie solvent) is utilized over portions of the substrate 
to be protected, sometimes in combination with materials ttiat facilitate wetting by the reactant solution in other regions. 
In tills manner, tiie flowing solutions are further prevented from passing outside of their designated ftow patiis. 

The "spotting" methods of pr^jaring compounds and libraries of the present Invention can be implemented in much 
the same manner as ttie flow channel methods. For exanple, a monomer A can be delivered to and coupled witii a first 

40 group of reaction regions which have been appropriately activated. Thereafter, a monomer B can be delivered to and 
reacted with a second group of activated reaction regions. Unlike the flow channel embodiments described above, reac- 
tants are delivered by directly depositing (rather than flowing) relatively small quantities of ttiem in selected regions. In 
some st^. of course, ttie entire sut^sti'ate surface can be sprayed or otiienvise coated with a solution. In preferred 
embodiments, a dispenser moves from region to region, depositing only as much monomer as necessary at each stop. 

45 Typical dispensers include a miaopipette to deliver the nfx>nomer solution to the sutistrate and a robotic system to control 
the position of tiie micropipette with respect to ttie sut)strate. In otiier emtxxiiments. the dispenser includes a series of 
tubes, a manifold, an array of pipettes, or the like so ttiat various reagents can be delivered to ttie reaction regkms 
simultaneously. 

so C Generating An Array Of Oligonucleotides Using Pin-Based Mettiods 

Anottier mettiod which is useful for ttie preparation of an array of diverse oligonucleotides on a single substrate 
involves "pin based synttiesis." This method is desaibed In detail in U.S. Patent No. 5,288,514, previously incorporated 
herein by reference. The method utilizes a substrate having a plurality of pins or ottier extenstons. The pins are each 
ss inserted simultaneously into individual reagent containers in a tray. In a common emlxxiiment. an array of 96 pins/con- 
tainers is utilized. 

Each tray is filled with a particular reagent for coupling in a particular chemical reaction on an individual pin. Accord- 
ingly, the trays wilt often contain different reagents. Since the chemistry used is such that relatively similar reaction 
conditions may be utilized to perform each of the reactions, multiple chemical coupling steps can be conducted simul- 
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taneously. In the first step of the process, a sutsstrate on which the chemical coupling steps are conducted is provided. 
The substrate is optionally provided with a spacer having active sites, in the particular case of ollgonudeotides. for 
example, the spacer may be selected from a wide variety of molecules which can be used In organic environments 
associated with synthesis as well as in aqueous environments associated with binding studies. Examples of suitable 

5 spacers are polyethyleneglycols. dicartx)xytic acids, polyamines and alkytenes. substituted with, for example, methoxy 
and ethoxy groups. Additionally, the spacers will have an active site on the distal end. The active sites are optionally 
protected initially by protecting groups. Among a wide variety of protecting groups which are ireful are FMOC, BOC, t- 
butyl esters, t-butyl ethers, and the like. Various exemplary protecting groups are described in, for example, Atherton et 
a/.. Solid Phase f^ptide Synthesis, IRL Press (1989). incorporated herein by reference In some embodiments, the 

10 spacer may provide for a deavable function by way of. for example, exposure to add or base. 

Dl Generating An Array Of Oligonucleotides Using Bead Based Methods 

In addition to the foregoing methods, another method which is useful fbr synthesis of an array of oligonudeotides 
IS involves "bead based synthesis." A general approach for bead based synthesis is described in copending Application 
Ser. Nos. 07/762,522 (filed September 18, 1991); 07/946,239 (filed September 16, 1992); 08/146.886 (filed November 
2. 1993): 07/876,792 (filed April 29, 1992) and PCT/US93/04145 (filed April 28, 1993). the disdosures of which are 
incorporated herein by reference. 

For the synthesis of molecules such as oligonucleotides on beads, a large plurality of beads are suspended in a 
20 suitable carrier (such as water) in a container. The beads are provided with optional spacer molecules having an active 
site. The active site is protected by an optional protecting group. 

In a first step of the synthesis, the beads are divided fbr coupling into a plurality of containers. For the purposes of 
this brief description, the number of containers will be limited to three, and the monomers denoted as A. B, C, D. E. and 
F. The protecting groups are then removed and a first portion of the mdecute to be synthesized is added to each of the 
25 three containers (i.e., A is added to container 1 , B is added to container 2 and C is added to container 3). 

Thereafter, the various beads are appropriately washed of excess reagents, and remixed in one container. Again, 
it will be recognized that by virtue of the large number of beads utilized at the outset, there will similarly be a large number 
of beads randomly dispersed in the oontainer. each having a particular first portion of the nrK)nomer to be synthesized 
on a surface thereof. 

30 Thereafter, the various beads are again divided for coupling In another groxsp of three containers. The beads in the 
first container are deprotected and exposed to a second monomer (D), while the beads in the second and third containers 
are coupled to molecule portions E and F, respectively. Accordingly, molecules AD, BD. and CD will be present in the 
first container, while AE. BE, and CE will be present in the second container, and molecules AF. BF, and CF will be 
present in the third container. Each bead, however, v^li have only a single type of molecule on its surface. Thus, all of 

35 the possible molecules formed from the first portions A, B, C, and the second portions D. E, and F have been formed. 
The beads are then recombined into one container and additional steps are conducted to complete the synthesis 
of the polymer molecules. In a preferred embodiment, the beads are tagged with an identifying tag which is unique to 
the particular ollgonudeotide which is present on each bead. A complete description of identifier tags fbr use in synthetic 
libraries is provided In co-pending Application Ser. No. 08/146,886 (fOed November 2, 1993), previously incorporated 

40 by reference fbr all purposes. 

IV. Sequencing By Hybridization Using the Prot)e Wing Strategy 

Using the VLStPS^ techndogy described akx>ve, one can generate arrays of immobilized probes which can be 
45 used to compare a reference sequence of known sequence with a target sequence showing sulsstantial similarity with 
the reference sequence, but differing in the presence of, fbr example, mutations. In fact WO 95/11995, the teachings 
of which are Incorporated herein by reference, describes a number of strategies fbr comparing a polynudeotlde of known 
sequence (a reference sequence) with variants of that sequence (target sequences). The comparison can be peribrmed 
at the level of entire genomes, chromosomes, genes, exons or introns, or it can focus on indhridual mutant sites and 
so immediately adjacent bases. The strategies alksw detection of variations, such as mutations or polymorphisms, in the 
target sequence inespective of whether a particular variant has previously been characterized. The strategies both 
define the nature of a variant and identify its location in a target sequence. 

The strategies employ an^ys of oligonucleotide probes imnnobllized to a sdid support. Target sequences are ana- 
lyzed by determining ttie extent of hybridization at particular probes In the array. The strategy in selection of probes 
55 facilitates dstinction between perfectiy matched probes and probes showing single-base or other degrees of mis- 
matches. The strategy usually entails sampling each nudeotide of interest in a target sequence several times, tiiereby 
achieving a high degree of confidence in its iderrtity. This level of confidence is further increased by sampling of adjacent 
nudeotides in the target sequence to nucleotides of Interest The tiling strategies cfisdosed In WO 95/1 1995 result in 
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sequencing and comparison methods suitable for routine large-scale practice with a high degree of confidence in the 
sequence output 

A SelBCtlon ofReterenee Sequeneo 

s 

The arrays are designed to contain probes exhibiting complementarity to one or more selected reference sequence 
whose sequence is known. The arrays are used to read a target sequence comprising either the reference sequence 
itself or variants of that sequence. Target sequences may differ from the reference sequence at one or more positions 
but show a high overall degree of sequence identity with the reference sequence (e.^., at least 75. 90. 95. 99. 99.9 or 

10 99.99%). Any polynucleotide of known sequence can be selected as a reference sequence. Reference sequences of 
interest include sequences known to include mutations or polymorphisms associated with phenotypic changes having 
clinical significance in human patients. For example, the CFTR gene and P53 gene in humans have been identified as 
the location of several mutations resulting in cystic fibrosis or cancer respectively. Other reference sequences of interest 
include those tfiat sen^e to identify pathogenic microorganisms and/or are ttie site of mutations by which such microor- 

15 ganlsms acquire drug resistance (e.g.. the HIV revise transcriptase gene). Other reference sequences of interest 
include regions where polymorphic variations are known to occur (e.^.. the D-loop region of mitochondrial DNA). These 
reference sequences have utility for, e.^.. forensic or epidemiologica] studies. Other reference sequences of interest 
include p34 (related to p53). p65 (implicated in breast, prostate and liver cancer), and DNA segments encoding cyto- 
chromes P450 and other biotransformation genes (see Meyer et al. . Pharmac, Then 46. 349-355 (1 990)) . Other reference 

20 sequences of interest include those from the genome of patiiogenic viruses (e^.. hepatitis (A, B, or C), herpes virus 
(e.^.. VZV. HSV-1. HAV'6. HSV-II. and CMV. Epstein Barr virus), adenovirus, influenza virus, flaviviruses. echovirus. 
rhinovirus, coxsackie virus, comovirus. respiratory syncytial virus, mumps virus, rotavirus, measles virus. rut)ella virus, 
parvovirus, vaccinia virus. HTLV virus, dengue virus, papillomavirus, molluscum virus, poliovirus. rabies virus, JC virus 
and artxTviral encephalitis virus. Other exemplary reference sequences which can be analyzed using the tiling strategy 

25 are disclosed in WO 95/1 1 995. 

The length of a reference sequence can vary widely from a fulHength genome, to an individual chromosome, epi- 
some. gene, component of a gene, such as an exon. intron or regulatory sequences, to a few nucleotides. A reference 
sequence of between about 2. 5. 1 0. 20, 50, 1 00. 5000, 1 000. 5.000 or 1 0.000. 20.000 or 1 00,000 nucleotides is common. 
Sometimes only particular regions of a sequence (e.^.. exons of a gene) are of interest In such situations, the particular 

30 regions can be considered as separate reference sequences or can be considered as components of a single reference 
sequence, as matter of arbitrary choice. 

A reference sequence can be any naturally occurring, mutant, consensus or purely hypothetical sequence of nucle- 
otides. RNA or DNA. For example, sequences can be obtained from computer data bases, publications or can t>e deter- 
mined or conceived de nova Usually, a reference sequence is selected to show a high degree of sequence identity to 

35 envisaged target sequences. Often, particularly, where a significant degree of divergence is anticipated between target 
sequences, more than one reference sequence is selected. Combinations of wildtype and mutant reference sequences 
are employed in several applications of tiie tiling strategy. 

S. Array Destgn 

40 

I.Basic Tiling Strategy 

The basic tiling strategy provides an array of immobilized probes for analysis of target sequences showing a high 
degree of sequence identity to one or more selected reference sequences. The strategy is first illustrated for an array 

45 that is subdivided into four probe sets, although it will be apparent that in some situations, satisfectory results are obtained 
from only two probe sets. A first probe set comprises a plurality of probes exhibiting perfect complementarity with a 
selected reference sequence. The perfect complementarity usually exists tiiroughout tiie length of the probe. However, 
probes having a segment or segments of perfect complementarity that is/are flanked by leading or trailing sequences 
lacking coirplementarity to the reference sequence can also be used. Within a segment of complementarity, each probe 

50 in the first probe set has at least one interrogation position tiiat con^esponds to a nucleotide in the reference sequence. 
That is. the interrogation position is aligned with the con^esponding nucleotide in tiie reference sequence, when the 
probe and reference sequence are aligned to maximize complementarity between the two. If a probe has more than one 
inten'ogation position, each corresponds with a respective nucleotide in the reference sequence. The identity of an 
intenrogation position and corresponding nucleotide in a particular probe in ttie first probe set cannot be determined 

55 simply by inspection of the probe in the first set As will become apparent an interrogation position and corresponding 
nucleotide is defined by the comparative structures of probes in the first probe set and corresponding probes from 
additional probe sets. 

In principle, a probe could have an Interrogation position at each position in the segment complementary to the 
reference sequenca Sometimes, interrogation positions provkJe more accurate data when located away from ttie ends 
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of a segment of complementarity. Thus, typically a probe having a segment of complementarity of length x does not 
contain more than x-2 Interrogation positions. Since probes are typically 9-21 nucleotides, arxJ usually all of a probe is 
complementary, a probe typically has 1-19 interrogation positions. Often the probes contain a single interrogation posi- 
tion, at or near the center of probe. 

For each probe in the first set, there are, for purposes of the present illustration, up to three corresponding probes 
from three additional probe sets. See, FIG. 1 1 . Thus, there are four probes corresponding to each nucleotide of interest 
In the reference sequence. Each of the four corresponding probes has an interrogation position aligned witii that nucle- 
otide of interest Usually, the probes from the tiiree additional probe sets are identical to tiie corresponding probe from 
the first probe set with one exception. The exception is tiiat at least one (and often only one) interrogation position, which 
occurs in tiie same position in each of the four corresponding probes from the four probe sets, is occupied by a different 
nucleotide in the four probe sets. For example, for an A nucleotide in the reference sequence, the corresponcfing probe 
from the first probe set has its interrogation position occupied by a T. and the corresponding probes from tiis additional 
three probe sets have their respective interrogation positions occupied by A, C, or G. a different nucleotide in each probe. 
Of course, if a probe from the first probe set comprises trailing or flanking sequences lacking conrplementarity to the 
reference sequences {see FIG. 12). these sequences need not be present in corresponding probes from the three 
additional sets. Likewise corresponding probes from the three additional sets can contain leading or trailing sequences 
outside the segment of complementarity that are not present in the corresponding probe from the first probe set. Occa- 
sionally, the probes from the additional three probe set are identical (with tiie exception of interrogation position(s)) to 
a contiguous subsequence of tiie full complementary segment of the corresponding probe from tiie first probe set. In 
this case, tiie sut)sequence includes the interrogation position and usually differs from the full-lengtii probe only in tiie 
omission of one or botti terminal nucleotides from the termini of a segment of complementarity. That is, if a probe from 
the first probe set has a segment of oomplementarity of length n, corresponding probes from the other sets will usually 
include a subsequence of the segment of at least length n-2. Thus, tiie sut)sequence is usually at least 3. 4. 7. 9. 15. 
21 . or 25 nucleotides long, most typically, in the range of 9-21 nucleotides. The subsequence should be suffictentiy long 
to allow a probe to hytxidize detectably more strongly to a variant of the reference sequence mutated at the interrogation 
position than to the reference sequence. 

The probes can be oHgodeoxyribonucleotides or oligoribonudeotides. or any nnxfif led forms of these polymers that 
are capable of hybridizing with a target nucleic sequence by complementary t>asei3airing. Complementary base pairing 
means sequence-specific base pairing which includes e.g., Watson-Crick base pairing as well as otiier forms of base 
pairing such as Hoogsteen base pairing. Modified forms include 2'-0-methy1 oligoribonudeotides and so-called PNAs, 
in which oHgodeoxyribonucleotides are linked via peptide bonds rattier than phophodiester boiids. The probes can be 
attached by any linkage to a support (e.^., 3', 5* or via ttie base). 3' attachment is more usual as ttiis orientation is 
compatit)le witii the preferred chemistry for solid phase syrrthesis of oligonudeotides. 

The number of probes in the f irst probe set (and as a consequence the number of probes in additional probe sets) 
depends on the length of ttie reference sequence, ttie number of nucleotides of interest in ttie reference sequence and 
the number of interrogation positions per probe. In general, each nudeotide of interest in the reference sequence requires 
the same interrogation position in ttie four sets of probes. Consider, as an example, a reference sequence of 100 nude- 
otides. 50 of which are of interest, and probes each having a single interrogation position. In this situation, the first probe 
set requires f ifty probes, each having one interrogation position con^esponding to a nucleotide of interest in tiie reference 
sequence. The second, ttiird and fourth probe sets each have a corresponding probe for each probe in the first probe 
set, and so each also contains a total of fifty probes. The identity of each nucleotide of interest in the reference sequence 
is determined by comparing tiie relative hybridization signals at four probes having interrogation positions corresponding 
to tiiat nucleotide from the four probe sets. 

in some reference sequences, every nudeotide Is of interest In ottier reference sequences, only certain portions 
in which variants (eg., mutations or polymorphisms) are concentrated are of Interest In other reference sequences, 
only particular mutations or polymorphisms and immediately adjacent nudeotides are of interest Usually, the first probe 
set has interrogation positions selected to correspond to at lead a nudeotide (e.g., representing a point mutation) and 
one immediately adjacent nucleotide. Usually, tiie probes in ttie first set have interrogation positions conresponding to 
at least 3, 10, 50, 100, 1000, or 20,000 contiguous nudeotides. The probes usually have interrogation positions conre- 
sponding to at least 5, 10, 30, 50, 75. 90, 99 or sometimes 100% of the nudeotides in a reference sequence. Firequentiy, 
the probes in tiie first probe set conpletely span the reference sequence and overiap witti one anottier relative to ttie 
reference sequence. For example, in one common anang^ent each prcbe In the first probe set differs from another 
probe in that set by ttie omission of a 3' base complementary to the reference sequence and ttie acquisition of a 5' base 
complementary to ttie reference sequence See, FIG. 13. 

The number of probes on ttie array can be quite large (e.g., 1 0^-1 0®). However, often only a relatively small proportion 
(Ae.. less ttian about 50%, 25%, 10%, 5% or 1%) of the total number of probes of a given length are selected to pursue 
a particular tiling strategy. For example, a complete set of octomer probes comprises 65,536 probes: ttius, an anray of 
the invention typically has fewer than 32,768 octomer probes. A complete array of decamer probes comprises 1 ,048,576 
probes; thus, an array of the invention typically has fewer than atx)ut 500.000 decamer probes. Often arrays have a 
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lower limit of 25. 50 or 100 probes and an upper limit of 1.000.000» 100.000. 10.000 or 1000 probes. The arrays can 
have other components besides the probes such as linkers attaching the probes to a support. 

Some advantages of the use of only a proportion of all possit>le probes of a given length include: (i) each position 
in the an-ay is highly informative, whether or not hybrkJization occurs: 00 nonspecific hybridization is minimized: (ii<) >t is 

5 straightfonArard to conelate hybridization differences with sequence differences, particularty with reference to the hybrid* 
ization pattern of a known standard: and (iv) the ability to address each probe independently during synthesis, using 
high resolution photolithography, allows the array to be designed and optimized for any sequenca For example the length 
of any probe can be varied independently of the others. 

For conceptual simplicity, the probes in a set are usually arranged in order of the sequence in a lane across the 

10 anay. A lane contains a series of overlapping probes, which represent or tile across, the selected reference sequence 
(see. FIG. 13). The components of the four sets of probes are usually laid down in four parallel lanes, collectively con- 
stituting a row in the horizontal direction and a series of 4-member columns in the vertical cfirection. Corresponding 
probes from the four probe sets (/le. complementary to the same subsequence of the reference sequence) occupy a 
column. Each probe in a lane usually differs from its predecessor in the lane by the omission of a base at one end and 

IS the inclusion of additional base at the other erxi as shown in FIG. 13. However, this orderly progression of probes can 
be interrupted by the inclusion of control probes or omission of probes in certain columns of the array Such columns 
serve as controls to orient the array, or gauge the bad^round. wNch can include target sequence nonspedf icaily bound 
to tiie array 

The probes sets are usually laid down in lanes such that all probes having an interrogation position occupied t>y an 
20 A form an A-lane. all probes having an inten^ogation position occupied by a C form a C-lane, all probes having an 
inten^ogation position occupied by a G form a G-lane. and all probes having an intenrogation position occupied by a T 
(or U) fbnm a T lane (or a U lane). Note that in this arrangement there is not a unique carespondence between probe 
sets and lanes. Thus, the probe from the first prot>e set is laki down in tiie A-lane.. C-lane, A-lane. A-lane and T-lane for 
the five columns in FIG. 14A. The interrogation position on a column of probes conesponds to the position in the target 
25 sequence wfiose identity is determined from analysis of hybridization to the probes in that column. Thus. \y\s respectively 
correspond to N^-Ns in FIG. 14A. The interrogation position can be anywhere in a probe but is usually at or near the 
central position of the probe to maximize differential hybrklizatton signals between a perfect match and a single-base 
mismatch. For example, for an 1 1 mer probe, the central position is the sixth nucleotide. 

Although the array of probes is usually laid down in rows arKi columns as described above, such a physical arrange- 
30 ment of probes on the array is not essential. Provided that the spatial location of each probe in an array is known, tiie 
data from the probes can be collected and processed to yield the sequence of a target irrespective of the physical 
anBngement of the probes on a anray In processing the data, tiie hybridization signals from tiie respective probes can 
be reasserted into any conceptual array desired for subsequent data reduction whatever the physical arrangement of 
probes on the an^ay 

35 A range of lengths of prot)es can be employed in the arrays. As noted above, a probe may consist exclusively of a 
complementary segments, or may have one or nrx>re complementary segments juxtaposed by flanking, trailing and/or 
intervening segments. In tiie latter situation, the total length of complementary segment(s) is wore important that the 
length of the probe. In functional terms, the complementary segment(s) of the first probe sets shouU be suff icientiy long 
to allow the probe to hyt)ridize detectably more strongly to a reference sequence compared witti a variant of the reference 

40 including a single base mutation at tiie nucleotide corresponding to the interrogation position of tiie probe. Similarly, tiie 
complementary segment(s) in corresponding probes from additional probe sets should be sufficientiy long to allow a 
probe to hybridize detectably more strongly to a variant of the reference sequence having a single nudeotkle substitution 
at ttie interrogation position relative to the reference sequence. A probe usually has a single complementary segment 
having a lengtti of at least 3 nucleotides, and more usually at least 5. 6. 7, 8. 9. 10, 11. 12. 13, 14, 15, 16, 17, 18, 19, 

45 20, 21 , 22, 23, 24, 25 or 30 bases exhibiting perfect complementarity (other than possibly at the interrogation position(s) 
depending on the probe set) to the reference sequence. In bridging strategies, where more than one segment of com- 
plementarity is present each segment provides at least three complementary nucleotides to tiie reference sequence 
and tiie combined segments provide at least two segments of ttiree or a total of six complementary nucleotides. As in 
the other strategies, the combined length of complementary segments is typically from 6-30 nucleotides, and preferably 

so from about 9-21 nucleotides. The two segments are often approximately the same lengtti. Often, ttie probes (or segment 
of complementarity wittiin probes) have an odd number of bases, so ttiat an interrogation position can occur in tiie exact 
center of the probe. 

In some arrays, all probes are the same lengtti. Ottier arrays employ different groups of probe sets, in which case 
the probes are of the same size within a group, but differ between different groups. For example, some arrays have one 
55 group comprising four sets of probes as desaibed above in which all tiie probes are 1 1 mers, together with a second 
group comprising four sets of probes in which all of the probes are 13 mers. Of course, additional groips of probes can 
be added. Thus, some arrays contain. e.g.. four groups of probes having dzes of 1 1 mers, 13 mers, 15 mers and 17 
mers. Ottier arrays have different size probes within the same group of four probe sets. In these arrays, the probes in 
the first set can vary in length independentiy of each ottier. Probes in ttie other sets are usually the same length as ttie 



16 



EP0721 016 A2 



probe occupying the same column from the first set. However, occasionally different lengths of probes can be indtxJed 
at the same column position in the four lanes. The different length probes are included to equalize hybricfization signals 
from probes in-espective of whether A-T or C-G bonds are formed at the interrogation position. 

The length of probe can be important in distinguishing between a perfectly matched probe and probes showing a 

5 single-base mismatch with the target sequence. The discrimination is usually greater for short probes. Shorter probes 
are usually also less susceptible to formation of secondary structures. However, the absolute anrK)unt of target sequence 
bound, and hence the signal, is greater for larger probes. The probe length representing the optimum compromise 
between these competing considerations may vary depending on, inter alia, the GC content of a particular region of the 
target DNA sequence, secondary structure, syntiiests efficiency and cross-hybridization. In some regions of the target, 

10 depending on hybridization conditions, short probes (e.^.,11 mers) may provide information that is inaccessible from 
longer probes (e.^.. 19 mers) and vice versa. Maximum sequence information can be read by including several groups 
of different sized probes on the array as noted abova However, for many regions of the target sequence, such a strategy 
provides redundant information in that the same sequence is read multiple times from the different groups of probes. 
Equivalent information can be obtained from a single group of different sized probes in which the sizes are selected to 

IS maximize readable sequence at particular regions of the target sequence. The strategy of customizing probe length 
within a single group of probe sets minimizes the total number of probes required to read a particular target sequence. 
This leaves ample capacity for the anray to include probes to otiier reference sequences. 

The invention provides an optimization block which allows systematic variation of probe length and interrogation 
position to optimize the selection of probes for analyzing a particular nucleotide in a reference sequence. The block 

20 comprises alternating columns of probes complementary to the wildtype target and probes complementary to a specific 
mutation. The interrogation position is varied between columns and probe lengtii is varied down a column. Hybridization 
of the array to the reference sequence or the mutant Ibrm of the reference sequence identifies tiie probe lengtti and 
interrogation ^ition providing the greatest differential hytxidization signal. 

Variation of inten-ogation position in probes for analyzing different regions of a target sequence offers a nunrber of 

25 advantages. If a segment of a target sequence contains two closely spaced mutations, ml. and m2. and probes for 
analyzing that segment have an interrogation position at or near the middle, then no probe has an interrogation position 
aligned with one of the mutations without overlapping the other mutation (see. first probe in FIG. 1 4B). Thus, tiie presence 
of a mutation would have to be detected by comparing the hytnidization signal of a single-mismatched probe with a 
double-mismatched probe. By contrast, if the intemogation position is near the 3' end of the probes, probes can have 

30 their interrogation position aligned witti ml without overlapping m2 (second probe in FIG. 148). Thus, tiie mutation can 
be detected by a comparison of a perfectiy matched probe with single based mismatched probes. Similarly, if the inter- 
rogation position is near the 5* end of the probes, probes can have their interrogation position aligned with m2 without 
overlapping m1 (third probe in FIG. 14B). 

Variation of the interrogation position also offers the advantage of reducing loss of signal due to self-armealing of 

35 certain probes. FIG. 14C shows a target sequence having a nucleotide X, which can be read either from the relative 
signals of the four probes having a central interrogation position (shown at the left of the figure) or from the four probes 
having ttie interrogation position near tiie three prime end (shown at the right of the figure). Only the probes having the 
central interrogation position are capable of self-annealing. Thus, a higher signal is obtained from the probes having tiie 
interrogation position near the terminus. 

40 The probes are designed to be complementary to erttier strand of the reference sequence (e.^;.. coding or non- 
coding). Some arrays contain separate groups of probes, one complementary to the coding strand, ttie ottier comple- 
mentary to the noncoding strand. Independent analysis of coding and noncoding strands provides largely redundant 
information. However, the regions of ambiguity in reading the coding strand are not always the same as those in reading 
the noncoding strand. Thus, combination of ttie information from coding arvi noncoding strands increases the overall 

45 accuracy of sequencing. 

Some arrays contain additional probes or groups of probes designed to be oonplementary to a second reference 
sequence. The second reference sequence is often a subsequence of tiie first reference sequence bearing one or more 
commonly occurring mutations or interstrain variations. The second group of probes is designed by the same principles 
as desaibed above except that the probes exhibit complementarity to ttie second reference sequence. The inclusion of 

so a second group is particular useful fbr analyzing short subsequences of ttie primary reference sequence in which multiple 
mutations are expected to occur within a short distance commensurate witti the length of the probes (/.e.. two or more 
mutations within 9 to 21 bases). Of course, the same principle can be extended to provide arrays containing groups of 
probes for any number of reference sequences. Alternatively, the arrays may contain additional probe(s) ttiat do not form 
part of a tiled array as noted above, but rather serves as probe(s) for a conventional reverse dot blot. For example, the 

55 presence of mutation can be detected from binding of a target sequence to a single oligomeric probe harboring ttie 
mutation. Preferably, an additional probe containing ttie equivalent region of ttie wildtype sequence is included as a 
control. 

Although only a subset of probes is required to analyze a particular target sequence, it is quite possible that ottier 
probes superiluous to the contemplated analysis are also included on the array. In the extreme case, the array could 



17 



EP0 721 016 A2 



can a complete set of all probes of a given length notwithstanding that only a small subset is required to analyze the 
particular reference sequence of interest Although such a situation might appear wasteful of resources, a array Including 
a complete set of probes offers the advantage of including the appropriate subset of probes for analyzing any reference 
sequence. Such a array also allows simultaneous analysis of a reference sequence from different subsets of probes 

5 (e.^.. subsets having the interrogation site at different positions in the probe). 

In its simplest terms, the analysis of a array reveals whether the target sequence is the same or different from the 
reference sequence. If the two are the same, all probes in the first probe set show a stronger hybridization signal than 
corresponding probes from other probe sets. If the two are different most probes from the first probe set still show a 
stronger hybridization signal than corresponding probes from the other probe sets, but some probes from the first probe 

10 set do not. Thus, when a probe from another probe sets light up more strongly than the corresponding probe from the 
first probe set. this provides a simple visual indication that the target sequence and reference sequence differ. 

The arrays also reveal the nature and position of differences between the target and reference sequence. The arrays 
are read by comparing the intensities of labelled target bound to the probes in an array Specifically for each nucleotide 
of Interest in the target sequence, a comparison is performed between probes having an interrogation position aligned 

15 with that position. These probes form a column (actual or conceptual) on the array For example, a column often contains 
one probe from each of A, C, G and T lanes. The nucleotide in ttie target sequence is identified as the complement of 
the nucleotide occupying the interrogation position in tiie probe showing the highest hybridization signal from a column. 
FIG. 1 5 shows the hybridization pattern of a an^y hybridized to its reference sequence. The dark square in each column 
represents the probe from the column having the highest hybridization signal. The sequence can be read by following 

20 the pattern of dark squares from left to right across the array. The first dark square is in the A lane indicating that the 
nucleotide occupying ttie intenogation position of the probe represented by ttiis square Is an A. The first nucleotide in 
the reference sequence is the complement of nucleotide occupying the interrogation position of this probe (/.a, a T). 
Similarly the second dark square is in the TIane. from which it can be deduced that tiie second nucleotide in tiie reference 
sequence is an A. Likewise the third dark square is in the T-lane, from which it can be deduced that the third nucleotide 

25 in tiie reference sequence is also an A. and so forth. By including probes In the first probe set (and by implication in the 
otiier probe sets) with intenrogation positions con'esponding to every nucleotide in a reference sequence, it is possible 
to read substantially every nucleotide in a target sequence, ttiereby revealing the complete or nearly complete sequence 
of the target. 

Of the four probes in a column, only one can exhibit a perfect match to the target sequence whereas the others 

30 usually exhibit at least a one base pair mismatch. The probe exhibiting a perfect match usually produces a substantially 
greater hybrkjization signal ttian tiie other three probes in the column and is thereby easily kientif ied. However, in some 
regions of tiie target sequence, tiie distinction between a perfect match and a one-base mismatch is less clear. Thus, 
a call ratio is established to define the ratio of signal from the best hybridizing probes to tiie second best hybridizing 
probe that must be exceeded for a particular target position to be read from the probes. A high call ratio ensures that 

35 few if any en^ors are made In calling target nucleotides, txjt can result in some nucleotides being scored as ambiguous, 
which couki in fact be accurately read. A lower call ratio results in fewer ambiguous calls, but can result in more enroneous 
calls. It has been found that at a call ratio of 1 .2 virtually all calls are accurate. However, a small but signiftoant number 
of bases {e.g., up to about 10%) may have to t>e scored as ambiguous. 

Although small regions of the target sequence can sometimes be ambiguous, these regions usually occur at tiie 

40 same or similar segments in different target sequences. Thus, for precharacterized mutations, it is known in advance 
whether that mutation is likely to occur within a region of unanft)iguously determinable sequence. 

An array of probes is most useful for analyzing the reference sequence from which tiie probes were designed and 
variants of that sequence exhibiting substantial sequence similarity witti the reference sequence (e.^.. several single- 
base mutants spaced over the reference sequence). When an array is used to analyze the exact reference sequence 

45 from wtiich it was designed, one probe exhibits a perfect match to the reference sequence, and the other three probes 
in the same column exhibits single-base mismatches. Thus, discrimination between hybridization signals is usually high 
and accurate sequence is obtained. High accuracy is also obtained when an amy is used for analyzing a target sequence 
comprising a variant of the reference sequence that has a single mutation relative to tiie reference sequence, or several 
widely spaced mutations relative to the reference sequence. At different rrutant loci, one probe exhibits a perfect match 

so to the target, and tiie ottier three probes occupying the same column exhibit single-base mismatches, the (fifference 
(with respect to analysis of the reference sequence) being the lane in which the perfect match occurs. 

For target sequences showing a high degree of divergence from tiie reference strain or incorporating several closely 
spaced mutations from the reference strain, a single group of probes {Le., designed witii respect to a single reference 
sequence) will not always provkJe accurate sequence for the highly variant region of this sequenca At some particular 

55 columnar positions, it may be ttiat no single probe exhibits perfect complementarity to the target and that any comparison 
must be based on different degrees of mismatch between the four probes. Such a comparison does not always allow 
the target nucleotide conesponding to tiiat columnar position to be called. Deletions in target sequences can be detected 
by loss of signal from probes having interrogation positions encompassed by the deletion. However, signal may also be 
lost from probes having interrogation positions closely proximal to the deletion resulting in some regions of the target 
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sequence that cannot be read. Target sequence bearing insertions will also exhibit short regions including and proximal 
to the insertion that usually cannot be read. 

The presence of short regions of difficult-to-read target because of closely spaced mutations, insertions or deletions, 
does not prevent determination of the remaining sequence of the target as different regions of a target sequence are 

5 determined independently Moreover, such ambiguities as might result from analysis of diverse variants with a single 
group of probes can be avoided by including multiple groups of probe sets on a array. For example, one group of probes 
can be designed based on a fulMength reference sequence, and the other groups on subsequences of the reference 
sequence incorporating frequently occum'ng mutations or strain variations. 

A particular advantage of the present sequencing strategy over conventional sequencing methods is the capacity 

70 simultaneously to detect and quantify proportions of muitipte target sequences. Such capacity is valuable, e^., for diag- 
nosis of patients who are heterozygous with respect to a gene or who are infected with a virus, such as HIV. which is 
usually present in several polymorphic forms. Such capacity is also useful in analyzing targets from biopsies of tumor 
cells and surrounding tissues. The presence of multiple target sequences is detected from the relative signals of the 
four probes at the array columns con^esponding to the target nucleotides at which diversity occurs. The relative signals 

IS of the four probes for the mixture under test are compared with the corresponding signals from a homogeneous reference 
sequence. An increase in a signal from a probe that is mismatched with respect to the reference sequence, and a 
corresponding deaease in the signal from the probe which is matched with the reference sequence, signal the presence 
of a mutant strain in the mixture. The extent in shift in hybridization signals of the probes is related to the proportion of 
a target sequence In the mixture. Shifts in relative hybridization signals can t>e quantitatively related to proportions of 

20 reference and mutant sequence by prior calibration of the array with seeded mixtures of the mutant and reference 
sequences. By this means, a array can be used to detect variant or mutant strains constituting as littie as 1 . 5, 20, or 25 
% of a mixture of stains. 

Similar principles allow the simultaneous analysis of multiple target sequences even when none is identical to the 
reference sequence. For example, with a mixture of two target sequences t)earing first and second mutations, there 

25 would be a variation in the hybridization patterns of probes having interrogation positions corresponding to the first and 
second mutations relative to the hybridization pattern with the reference sequence. At each position, one of the probes 
having a mismatched intem^gation position relative to the reference sequence would show an increase in hybridization 
signal, and the probe having a matched Interrogation position relative to the reference sequence would show a deaease 
in hybridization signal. Analysis of the hybridization pattern of the mixture of mutant target sequences, preferat)ly in 

30 conparison wrtii the hybridization pattern of the reference sequence, indicates tiie presence of two mutant target 
sequences, the position and nature of the mutation in each strain, and the relative proportions of each strain. 

In a variation of the above method, several target sequences target sequences are differentially labelled before 
being simultaneously applied to the anay For example, each different target sequence can be labelled with a fluoresced 
labels emitting at different wavelength. After applying a mixtures of target sequence to the arrays, the individual target 

35 sequences can be distinguished and independently analyzed by virtue of the differential labels. For example, the methods 
target sequences obtained from a patient at different stages of a disease can be differentiy labelled and analyzed simul- 
taneously, facilitating identification of new mutations. 

2. PlQCkHlinq 

40 

In block tiling, a perfectly matched (or wildtype) probe is compared witti multiple sets of mismatched or mutant 
probes. The perfectiy matched probe and the multiple sets of mismatched probes with which it is compared collectively 
form a group or block of probes on the array. Each set comprises at least one. and usually, three mismatched probes. 
FIG. 16 shows a perfectiy matched probe (CAATCGA) having three inten^ogation positions (1^, I2 and I3). The perfectiy 

45 matched probe is compared with three sets of probes (arbitrarily designated A, B and C), each having three mismatched 
probes. In set A, the three mismatched probes are identical to a sequence comprising the perfectly matched probe or 
a subsequence thereof including the interrogation positions, except at the first interrogation position. That Is. the mis- 
matched probes in the set A differ from the periectiy matched probe set at the first interrogation position. Thus, the 
relative hybrkiization signals of the perfectiy matched probe and the mismatched probes in the set A indicates ttie Identity 

so of the nucleotide in a target sequence con^esponding to the first intenrogation position. This nucleotide is the complement 
of ttie nucleotide occupying ttie interrogation position of the probe showing the highest signal. Similarly, set B comprises 
three mismatched probes, that differ from the perfectiy matched probe at the second interrogation position. The relative 
hybridization intensities of the perfectiy matched probe and the three mismatched probes of set B reveal the kientity of 
the nucleotide in the target sequence con'esponding to the second interrogation position {Le., n2 in FIG. 16). Similariy, 

55 the three mismatched probes in set C in FIG. 1 6 differ from the perfectiy matched probe at the tiiird inteaogation position. 
Comparison of ttie hybrklization intensities of the perfectly matched probe and the mismatched probes in the set C 
reveals the kientity of the nucleotide in the target sequence corresponding to the third interrogation position (n3). 

As noted above, a perfectly matched probe may have seven or more intenrogation positions. If there are seven 
interrogation positions, there are seven sets of ttiree mismatched probe, each set serving to identify the nucleotide 
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corresponding to one of the seven intenrogation positions. Similarly, if there are 20 interrogation positions in the perfectiy 
matched probe, then 20 sets of three mismatched probes are employed. As in other tiling strategies, selected probes 
can be omitted rf it is Known in advance that only certain types of mutations are likely to arise. 

Each block of probes allows short regions of a target sequence to be read. For example, for a block of probes having 
5 seven interrogation positions, seven nucleotides in the target sequence can be read. Of course, a array can contain any 
number of blocks depending on how many nucleotides of the target are of interest. The hybridization signals for each 
block can be analyzed independentiy of any other block. The block tiling strategy can also be combined with otfier tiling 
strategies, with different parts of tiie same reference sequence being tiled by different strategies. 

The block tiling strategy Is a species of the basic tiling strategy discussed above, in which the probe from the first 
10 probe set has more than one interrogation position. The perfectiy matched probe in the block tiling strategy is equivalent 
to a probe from the first probe set in tiie basic tiling strategy. The tiiree mismatched probes in set A in block tiling are 
equivalent to probes from the second, third and fourth probe sets in the basic tiling strategy The three mismatched 
probes in set B of block tiling are equivalent to probes from additional probe sets in basic tiling arbitrarily designated the 
fifth, sixth and seventii probe sets. The three mismatched probes in set C of blocking tiling are equivalent to probes from 
75 three further probe sets in k)asic tiling arbitrarily designated the eighth, ninth arxJ tenth probe sets. 

The block tiling strategy offers two advantages over a basic strategy in which each probe in the first set has a single 
interrogation position. One advantage is that the same sequence information can be obtained from fewer probes. A 
second advantage is that each of tiie probes constituting a block (/. e. , a probe from the first probe set and a corresponding 
probe from each of the other probe sets) can have identical 3' and 5* sequences, with the variation confined to a central 
20 segment containing the interrogation positions. The identity of 3* sequence between different probes simplifies the siraX- 
gy for solid phase synttiesis of the probes on the array and results in more uniform deposition of the different probes 
on the array, ttiereby in turn increasing tiie unHbrmity of signal to noise ratio for different regions of the array. 

V. Enzymatic Discrimination Eniiancement 

Unfortunately using ttie foregoing tiling strategies as well as ottier Sequencing By Hybridization techniques {e.g., 
those disclosed in co-pending Application Ser. Nos. 08/082,937 (filed June 25. 1993) and 08/168.904 (filed December 
15, 1993), each of which are incorporated herein by reference for all purposes), it is frequently difficult to discriminate 
between fully complementary hybrids and those tiiat cfiffer by one or more base pairs. However, K has now been deter- 
mined tiiat sequencing by hytxidization can be improved by using various enzymes that catalyze oligonucleotide cleav- 
age and ligation reactions. More particulariy, discrimination between fully ooinplementary hybrids and those that differ 
by one or more base pairs can be greatiy enhanced by using various enzymes that catalyze oligonucleotide cleavage 
and ligation reactions. 

35 A Enhanced Discrimination Using Nuclease Tteatment 

Nuclease treatment can be used to improve the quality of hybridization signals on high density oligonucleotide 
anays. More particularly, after tiie array of oligonucleotides has been combined witfi a labelled target nucleic acid to 
form target-oligonucleotide hybrid complexes, the target-oligonucleotide hybrid complexes are treated with a nuclease 

40 and, in turn, they are washed to remove non-perfectiy complementary target-oligonucleotide hybrid complexes. Following 
nuclease treatment the target:oligonucleotide hybrid complexes which are perfectiy complementary are more readily 
identified. From the location of the lal)elled targets, the oligonucleotide probes which hybridized with the targets can be 
identified and, in turn, the sequence of the target nucleic acid can more readily be determined or verified. 

The particular nuclease used will depend on the target nucleic add being sequenced. If the target is RNA, a RNA 

46 nuclease is used. Similariy, if the target is DNA, a DNA nuclease Is used. RNase A is an example of an RNA nuclease 
that can be used to increase the quality of RNA hyt>ridization signals on high density oligonucleotide arrays. RNase A 
effectively recognizes and cuts single-stranded RNA. including RNA in RNA:DNA hybrids ttiat is not in a perfect doMe- 
stranded structure. Moreover, RNA bulges, loops, and even single base mismatches can be recognized and cleaved by 
RNase A. In addition, RNase A recognizes and cleaves target RNA which binds to multiple oligonucleotide probes 

50 present on the substrate if there are intervening single-stranded regions. 81 nuclease and Mung Bean nuclease are 
examples of DNA nucleases which can be used to improve tiie DNA hybridization signals on high density oligonucleotide 
arrays. Other nucleases, which will be apparent to those of skill in the art can similariy be used to increase tiie quality 
of RNA hybridization signals on high density oligonucleotide anays and, in turn, to more accurately determine ttie 
sequence of the target nudeic add. 

55 FIG. 4 is a schematic outiine of a hybridization procedure which can be carried out prior to nuclease treatment. 
Fluorescein-UTP and -CTP labelled RNA is prepared from a PGR product by in vitro transcription. The RNA Is fragmented 
by heating and allowed to hybricfize with an array of oligonudeotide probes on a single substrate. The array of oligonu- 
deotide probes is generated using the tiling procedure described so that the array of oligonucleotide probes is capable 
of recognizing substantially all of the possible subsequences present in the target RNA. Moreover, for purposes of 
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comparison, the array of oligonucleotides is preferat)ly generated so that all of the four posslt)ie probes for a given position 
to be identified are in dose proximity to one another (/.e., so that they are in predefined regions which are near to one 
another). Following hybridization, the substrate is rinsed with the hybridization buffer and a quantitative fluorescence 
image of the hybridization pattern is obtained by, for example, scanning the substrate with a confbcal microscope. It 
should be noted that confocal detection allows hybridization to be measured in the presence of excess labelled target 
and. hence, if desired, hytxidization can be detected in real time. 

Following hybricfization. the sui^strate having an array of target: oligonucleotide hyt>ridization complexes thereon is 
contacted witti a nuclease. This is most simply carried out by adding a solution of the nuclease to the surface of tiie 
substrate. Alternatively, however, this can be cam'ed out by flowing a solution of the nuclease over tiie substrate using, 
for example, techniques similar to the flow channel mettiods described above. The nuclease solution is typically formed 
using the buffer used to carry out the hybridization reaction (/.e., the hybridization buffer). The concentration of tiie 
nuclease will vary depending on tiie particular nuclease used, but will typically range from about 0.05 iigMil to about 2 
mg/ml. Moreover, the time in which the array of target:oIigonucleotide hytxidization complexes is in contact with the 
nuclease will vary. Typically, nuclease treatment is carried out for a period of time ranging from about 5 minutes to 3 
hours. Following treatment with the nuclease, the substrate is again washed with the hybridization buffer, and a quanti- 
tative fluorescence image of the hybridization pattern is obtained by scanning the substrate with, for example, a confocal 
microscope. 

As such, nuclease treatment can be used following hyt)ridization to improve the quality of hybridization signals on 
high density oligonucleotide arrays and, in tum, to more accurately determine the sequence of the target nucleic add. 
It will be readily apparent to tiie those of skill in tiie art tiiat tiie foregoing is intended to illustrate, and not restrict, the 
way in which an array of targetioligonucleotide hytxid complexes can be treated with a nudease to improve hybridization 
signals on high density oligonudeotide arrays. 

In anotiier aspect, tiie present invention provides a metiiod for obteining sequencing information aboijrii an unlabeled 
target oligonucleotide, comprising: (a) contacting an unlabeled target oligonucleotide with a library of labeled oligonu- 
deotide probes, each of the oligonucleotide prot>es having a Known sequence and being attached to a solid support at 
a known position, to hybridize the target oligonucleotide to at least one menr4>er of tiie Ibrary of probes, thereby forming 
a hybridized library; (b) contacting the hybridized library witii a mjclease capable of deaving double-stranded oligonu- 
deotides to release from the hytxidized library a portion of the labeled diogonudeotide probes or fragments thereof; 
and (c) identifying the positions of the hybridized library from which labeled probes or fragments tiiereof have been 
removed, to determine the sequence of the unlabeled target oligonudeotide. 

In tills aspect of the invention a library of oligonudeotide probes is prepared, for example, using the VLSIPS™ 
technology describe above (See, Section III, supra). Once the library of probes has been prepared, ttie 5' tenninus of 
each probe can be labeled wifli a detectable label such as those described in Section V. infreu Preferably, the label is a 
fluorescent label. 

The library of labeled digonucleotide probes is then contacted with an unlabeled target oligonudeotide. The unla- 
beled oligonudeotide can be synthetic can be isolated from natural sources. In preferred embodiments, the unlabeled 
oligonudeotide is genomic DNA or RNA. For example, purified DMA or a whole-cell digest which has been partially 
sequenced can be lightiy fragmented (e.^., by digestion with a restrrction enzyme which provides infrequent cuts and 
which infrequently cuts within any of the regions desired to be resequenced). Tiie fragments of interest can be separated 
using a column containing probes complementary to a part of the sequence of interest. The complementary fifagments 
are bound in the column while tiie remaining DNA is washed through. The fragments of interest are then removed {e.g„ 
by heat or by chemical means) and contacted with tiie Ibrary of probes. 

Once ttie library of probes has been contacted with tiie target oligonudeotide under conditions suffident for hyt)rid- 
ization to occur, the resulting hybridized llksrary is contacted with an appropriate nudease enzyme. Alternatively, tiie 
nudease can be introduced to the litirary in the same mixture as the target oligonudeotide. The nudease can be any 
of a variety of commercially availat)le nudeases wfuch are capable of cleaving double-stranded DNA. Examples of such 
nudeases include DNase I. 

TTie hybridized library which has been contacted with tiie nudease is then washed to remove ttie label from those 
positions wherein hybridization has taken placa By scanning ttie washed library with a detector to determine the pres- 
ence or absence of labels in a region, hybridization information can be obtained. This method is applicable to resequenc- 
ing tilings (see. Section IV. supra), mutation detection and ottier combinatorial methods. Otiier advantages exist to ttie 
present mettiod. including (0 the use of unlabeled target oligonucleotide, which simplifies target preparation and allows 
genomic material to be used directiy, (ii) ttie use of a variety of nucleases which can be selected for cleaving ttie target 
and probe, the probe alone, or probe-probe interaction, and (iii) application using existing VLSIPS technology. 

The foregoing enzymatic discrimination enhancement metiiods can be used in all instances where improved dis- 
crimination between fully complementary hybrids and those that differ by one or more base pairs would be helpful. More 
particularly, such methods can be used to more accurately determine the sequence (e.^.. de novo sequencing), or 
monitor mutations, or resequence the target nudeic acid (/.a. such methods can be used in conjunction with a second 
sequendng procedure to provide independent verification). 
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0. Enhanced Discrimination Using Ligation Reactions 

Ligation reactions can be used to discriminate between fully complementary hybrids and those tiiat differ by one or 
more base pairs. More particularly, an anay of oligonucleotides is generated on a substrate (in the 3* to 5' direction) 

5 using any one of the methods descnbed afc>cve. The oligonucleotides in the anray are generally shorter in length than 
the target nucleic acid so that when hybridized to the target nucleic acid, ttie target nucleic acid generally has a 3' 
overhang. In tiiis embodiment, the target nucleic add is not necessarily labelled. After the array of oligonucleotides has 
been combined with the target nucleic add to form target-oligonudeotide hybrid complexes, the target-oligonudeotide 
hybrid complexes are contacted with a ilgase and a labelled, iigatable probe or. alternatively, with a pool of labelled. 

10 Iigatable probes. The ligation reaction of the labelled. Iigatable probes to the 5' end of tiie oligonudeottde probes on ttie 
substrate wilt occur, in the presence of the ligase. predominantiy when tiie targetioligonucleotide hybrid has formed witii 
correct base-pairing near the 5' end of the oligonudeotide probe and where there is a suitable 3* overhang of the target 
nudeic add to serve as a template for hybridization and ligation. After the ligation reaction, the substrate Is washed 
(multiple times if necessary) with water at a temperature of about 40''C to SO^'C to remove tiie target nudeic add and 

75 the labelled, uniigated prot)es. Thereafter, a quantitative fluorescence image of tiie hybridization pattern is obtained by 
scanning the substrate with, for example, a conlbcal miaoscope, and labelled digonudeotide probes, /le.. the oligonu- 
deotide probes which are perfectly complementary to tiie target nudeic add. are identified. Using tiiis information, 
sequence information about the target nucleic add can be determined. 

Any enzyme that catalyzes the formation of a phosphodiester bond at the site of a single-stranded break in duplex 

20 DNA can be used to enhance discrimination between fully conplementary hytxids and those that differ by one or more 
base pairs. Such ligases include, but are not limited to, T4 DNA ligase. ligases isolated from E co// and ligases isolated 
from ottier baderiophages. The concentration of tiie ligase will vary depending on the particular ligase used, tiie con- 
centration of target and buffer conditions, but will typically range from about 500 units/ml to alx}ut 5.000 unitsAnl. More- 
over, the time in which the array of targetioligonucleotide hyfc)ridization complexes is in contact with the ligase will vary. 

25 Typically, tfie ligase treatment is canied out for a period of time ranging from minutes to hundreds of hours. 

In a further embodiment, the present invention provides another method which can be used to improve discrimination 
of base-pair mismatches near the 5* end of the immobilized probes. More particularly, the present Invention provides a 
method for sequencing an unlabeled target oligonucleotide, the method comprising: (a) combining: (i) a sutsstrate com- 
prising an array of positionally distinguishable oligonucleotide probes each of which has a constant region and a variable 

30 region, the variable region capable of binding to a defined subsequence of preselected length; (ii) a constant oligonu- 
deotide having a sequence which is complementary to tiie constant region of the digonudeotide probes; (iiO a target 
oligonudeotide whose sequence is to be determined: and (iv) a ligase. thereby forming target oligonudeotide-oltgonu- 
deotide probe hybrid complexes of complementary sut}sequences of known sequence; (b) contacting the target oligo- 
nudeotlde-oligonucleotide probe hybrid complexes with a ligase and a pool of labelled. Iigatable digonudeotide probes 

35 of a preselected length, the pool of labelled. Iigatable oligonudeotide prot>es representing all possible sequences of the 
preselected lengtti; (c) removing unbound target nudeic acid and labelled, uniigated oligonudeotide probes; and (d) 
determining which of the digonudeotide probes contain the labdied, ligatat)le oligonucleotide probe as an indication of 
a subsequence which is perfectty conplementary to a subsequence of the target digonudeotide. See, FIQ. 8, which 
illustrates tiiis method. 

40 In tills mettiod, the constant region is typically from about 10 to about 1 4 nudeotides in length, whereas the variable 
region is typtoally from about 6 to about 8 nudeotides in lengtii. The labelled, Iigatable oligonudeotide probes have a 
preselected length, and tiie pod of such probes represents all possble sequences of the preseleded length. Thus, if 
the probe is 6 nudeotides in length, all possit}le 6-mers are present in the pod. As with the previously described method, 
any enzyme that catalyzes the formation of a phosphodiester bond at the site of a single-strand break in duplex DNA 

45 can be used to enhance discrimination between fully complementary hybrids and those that differ by one or more base 
pairs. Such ligases indude, t>ut are not limited to, T4 DNA ligase, ligases Isolated from E. co//and ligases isolated from 
other bacteriophages. The concentration of the ligase will vary depending on the particular ligase used, the concentration 
of target and buffer conditions. buX will typically range from about 500 units/ml to about 5,000 units/iml. Moreover, tiie 
time in which the anray of target oligonudeotide:oligonudeotide probe hybrid complexes is in contad with the ligase will 

so vary Typically, the ligase treatment is carried out for a period of time ranging from from minutes to hundreds of hours. 
In addition, it will be readily apparent to those of skill that the two ligation reactions can either be done sequOTtially or, 
alternatively, simultaneously in a single reaction mix that contains: target digonucleotides; constant digonudeotides; a 
pool of labeled, Iigatable probes; and a ligase. 

In the above method, the first ligation reaction will occur only if the 5' end of the target digonudeotide (/.e.. the last 

55 3-4 bases) matches the variable region of ttie oligonudeotide probe. Similariy. the second ligation reaction, which adds 
a label to the probe, will occur effidentty only if ttie first ligation reaction was successful and if ttie ligated target is 
complementary to the 5' end d the probe. Thus, this method provides for specificity at both erxis of tiie variak)le region. 
Moreover, this method is advantageous in that it allows a shorter variable probe region to be used: incresees probeiarget 
specffidty and removes the necessity of iabding the target. 
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As such, ligation reactions can effectively be used to improve discrimination of basepair mismatches near the 5' 
end of th immobilized probe, mismatches that are often poorly discriminated following hybridization alone. The foregoing 
enhancement discrimination methods involving the use of ligati n reactions can be used in all instances where improved 
discrimination between fully complementary hytuids and those that differ by one or more base pairs would be helpful. 
Mae particulariy such metiiods can be used to more accurately determine the sequence {e.g., de novo sequencing), 
or monitor mutations, or resequence the target nudeic add (/.e., such methods can be used in conjunction with a second 
sequencing procedure to provide independent verification). It will be readily apparent to those of skill in the art that the 
foregoing is intended to illustrate, and not restrict, the way in which an array of target:oligonucleotide hybrid complexes 
can be treated with a ligase and a pool of labelled, ligatable probes to improve hybridization signals on high density 
oligonudeotide arrays. 

W. Detection Methods 

Methods for detection depend upon the label selected. The criteria for selecting an appropriate label are discussed 
below, however, a fluorescent label is preferred because of its extreme sensitivity and simplicity. Standard labeling pro- 
cedures are used to determine ttie positions where interactions between a target sequence and a reagent take place. 
For example, if a target sequence is labeled and exposed to a matrix of different oligonudeotide probes, only those 
locations where the digonucleotides interact with the target will exhibit any signal. In addition to using a label, ottier 
methods may be used to scan tiie matrix to determine where interaction takes place. The spectrum of interactions can. 
of course, be determined in a temporal manner by repeated scans of interactions which occur at each of a multiplk% 
of conditions. However, instead of testing each individual Interaction separately, a multiplicity of sequence trrteractions 
may be simultaneously detennined on a matrix. 

A Labeling Techniques 

The target nudeic acid can be labeled using any of a number of convenient detectable markers. A fluorescent label 
is preferred because rt provides a very strong signal with low background. It is also optically detectable at high resdution 
and sensitivity through a quick scanning procedure. Other potential labeling moieties include, radioisotope, chemilumi- 
nescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers, magnetic labels, and linked 
enzymes. 

In another embodiment, different targets can be simultaneously sequenced where each target has a different label. 
For instance, one target could have a green fluorescent label and a second target could have a red fluorescent label. 
The scanning step will distinguish cites of binding of the red label from those binding the green fluorescent label. Each 
sequence can be analyzed independentiy from one another. 

Suitat)le chromogens which can be employed include those rrxslecules and compounds which adsorb light in a 
distinctive range of wavelengths so that a color can be observed or, alternatively, which emit light when irradiated witti 
radiation of a particular wave length or wave lengtii range, e.^., f luorescers. 

A wide variety of suitat)le dyes are available, being primary chosen to provide an intense color with minimal absorp- 
tion by their surroundings. Illustrative dye types include quinoline dyes, triarylmethane dyes, acridine dyes, alizarine 
dyes, phtiialeins, insect dyes, azo dyes, anthraquinoid dyes, cyanine dyes, phenazathionium dyes, and phenazoxonium 
dyes. 

A wkJe variety of fluorescers can be employed either by alone or, alternatively, in conjunction with quencher mole- 
cules. Fluorescers of interest fall into a variety of categories having certain primary functionalities. These primary func- 
tionalities include 1- and 2-aminonaphthalene, p,p'-diaminostilbenes, pyrenes, quaternary phenanthridine salts. 9- 
aminoacridines, p,p'-diaminobenzophenone imines, anthracenes, oxacaridocyanine. marocyanine. 3-aminoequilenin. 
perylene, bisbenzoxazole, bis-p-oxazolyl benzene. 1,2-benzophenazin. retinol. bis-3-aminopyridinium salts, helle- 
brigenin. tetracydine. sterophenol. benzimidzaolylphenylamine, 2-oxo-3-chromen. indole, xanthen, 7-hydrQxycoumarin. 
phenoxazine, salicylate, strophanthidin, porphyrins, triarylmetiianes and flavin, individual fluorescent con^unds which 
have functionalities for linking or which can be modified to incorporate such functionalities include. e.g., dansyt chloride; 
fluoresceins such as 3,6-dihydrQxy-9-phenylxanthhydrol; rhodannineisothiocyanate; N-phenyl 1-amino-8-suH6natonaph- 
thalene; N-phenyt 2-amino-6-sulfbnatonaphthalene: 4-acetannido-4-isottiiocyanato-sti!bene-2,2'-disulfonic acid; pyrene- 

3- su!fbnic add; 2-toluidinonaphthaiene-6-sulfonate; N-phenyl, N-methyl 2-aminoaphthalene-6-sulfonate; ethidium bro- 
mide; stetDrine; auromine-0,2-(9'-anthroyl)palmitate; dansyl phosphatidylethanolamine; N.N'-dioctadecyl oxacartxx;ya- 
nine; N.N'-dihexyl Qxacartx>cyanin6; merocyanine. 4(3'|pyrenyl)butyrate; d-3-aminodesoxy-equilenin; 12- 
(9'anthroyOstearate: 2-methylanthracene; 9-vinylantfiracene; 2,2*(vinylenei>phenylene)bisbenzoxazole; p-bis[2-(4- 
methyl-5-phenyl-oxazolyl)]benzene; 6-dimethy!amino-1,2-benzophenazin; retinol; bis(3'-aminopyridinium) 1.10-decan- 
dlyl diiodide; suifbnaphthylhydrazone of hellibrienin; chlorotetracycline; N(7-dimethylamino-4-methyl-2-oxo-3-chrome- 
nyl)maleimide; N-[p-(2-benzimidazolyl)-phenyl]maleimide; N-(4-f luoranthyQmaleimide; bis(homovanillic add); resazarin; 

4- chloro-7-nitro-2,1,3benzooxadiazde: me'ocyanine 540; resorufin; rose bengal; and 2.4-dtphenyl-3(2H)*furanone. 
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Desirably, f (uorescers should absorb light above about 300 nm. preferably about 350 nm. and more preferably above 
about 400 nm, usually emitting at wavelengths greater than about 1 0 nm higher than the wavelength of the light absorbed. 
It should be noted that the absorption and emission characteristics of the bound dye can differ from the unbound dye. 
Therefore, when referring to the various wavelength ranges and characterislics of the dyes. It is Intended to indicate the 

5 dyes as employed and not the dye which is unconjugated and characterized in an artsitrary solvent 

Fluorescers are generally preferred because by irradiating a f luorescer with light, one can obtain a plurality of emis- 
sions. Thus, a single label can provide for a plurality of measurable events. 

Detectable signal can also be provided by chemiluminescent and bioluminescent sources. Chemiluminescent 
sources include a compound which becomes electronically excited by a chemical reaction and can then emit light which 

10 serves as the detectible signal or donates energy to a fluorescent acceptor. A diverse number of families of conrpounds 
have been found to provide chemiluminescence under a variety or conditions. One family of compounds is a.S-dihydro- 
1 ,4-phthalazinecfione. The must popular compound is luminol, which is the 5-amino compound. Other members of the 
family include the 5-amino-6,7,8-trimethoxy* and the dimethylamino[ca]benz analog. These compounds can be made 
to luminesce with alkaline hydrogen peroxide or calcium hypochlorite and base. Another family of compounds is the 

IS 2.4.5-triphenylimidazoles, with lophine as the common name for the parent product Chemiluminescent analogs include 
para-dimethylamino and -methoxy substituents. Chemiluminescence can aUso be obtained with oxalates, usually oxalyl 
active esters, e.^., p-nitrophenyl and a peroxide, e.g., hydrogen peroxide, under basic conditions. Alternatively, tuciferins 
can be used in conjunction with luciferase or ludgenins to provide bioluminescence. 

Spin labels are provided by reporter molecules with an unpaired electron spin which can be detected by electron 

20 spin resonance (ESR) spectroscopy Exemplary spin labels Include organic free radicals, transitional metal complexes, 
particularly vanadium, copper, iron, and manganese, and the like. Exeniplary spin labels include nitroxide free radicals. 

B. Scanning System 

25 With the automated detection apparatus, the correlation of specific positional labeling is converted to tiie presence 
on the target of sequences for which the oligonucelotides have specificity of interaction. Thus, the positional information 
is directly converted to a database indicating what sequence interactions have occurred. For example, in a nucleic add 
hybridization application, the sequences which have interacted between the substrate matrix and the target molecule 
can be directly listed from the positional information. The detection system used is desaibed in PCT publication no. 

30 WO90/15070; and U.S.S.N. 07/624.120. Altiiough the detection described tiierein is a fluorescence detector, thedetector 
can be replaced by a spectroscopic or other detector. The scanning system can make use of a moving detector relative 
to a fixed substrate, a fixed detector with a moving substrate, or a combination. Alternatively, nrnnrors or other apparatus 
can be used to transfer the signal directly to tiie detector. See. eg.. U.S.S.N. 07/624,120. which is hereby incorporated 
herein by reference. 

35 The detectfon method will typically also incorporate some signal processing to determine whether the signal at a 
particular matrix position is a true positive or may be a spurious signal. For example, a signal from a region which has 
actual positive signal may tend to spread over and provide a positive signal In an adjacent region which actually should 
not have one. This may occur, e.g., where the scanning system is not property disaiminating with suffidentiy high res- 
olution in its pixel density to separate the two regions. Thus, the signal over the spatial region may be evaluated pixel 

40 by pixel to determine the locations and the actual extent of positive signal. A true positive signal shoukJ. in theory, show 
a uniform signal at each pixel location. Thus, processing by ptotting number of pixels with actual signal intensity shouki 
have a cleariy uniform signal intensity. Regions where the signal intensities show a fairly wkJe dispersion, may be par- 
ticularly suspect and the scanning system may be programmed to more carefully scan those positions. 

More sophisticated signal processing technkiues can be applied to the initial determination of whether a positive 

45 signal exists or not. See. e.^.. U.S.S.N. 07/624.120. 

From a listing of those sequences which interact, data analysis may be performed on a series of sequences, for 
example, in a nudeic acid sequence application, each of the sequences may be analyzed for their overiap regions and 
the original target sequence may be reconstructed from the collection of specific subsequences detained therein. Other 
sorts of analyses for different applications may also be performed, and because tiie scanning system directiy interfaces 

so with a computer the information need not be transferred manually. This provides for tiie ability to handle large amounts 
of data with very little human intervention. This, of course, provides significant advantages over nwiual manipulations. 
Increased throughput and reproducibility is thereby provided by tiie automation of vast majority of steps in any of these 
applications. 

55 B. Data Analysis 

Data analysis will differ depending upon whether sequencing de novo or resequendng is being done, but will typically 
involve aligning the proper sequences with thdr overiaps to determine the target sequence or a mutation in the target 
sequence. Although the target "sequence" may not specifically correspond to any spedfk: molecule, especially where 
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the target sequence is broken and fragmented up in the sequencing process, the sequence corresponds to a contiguous 
sequence of the subfragments. 

The data analysis can be performed manually or. preferably, by a computer using a appropriate program. Although 
the specific manipulations necessary to reassemble the target sequence from fragments may take many fbrnrs. one 

5 embodiment uses a sorting program to sort all of the sut^sequences using a defined hierarchy. The hierarchy need not 
necessarily correspond to any physical hierarchy, but provides a means to determine, in order, which subfragments have 
actually been found in the target sequence. In this manner, overlaps can be checked and found directly rather than 
having to search throughout the entire set after each selection process. For example, where the oligonucleotide probes 
are 10-mers, the first 9 positions can be sorted. A particular subsequence can be selected as in the examples, to deter- 

io mine where tiie process starts. As analogous to tiie theaetical example provided above, tiie sorting procedure provides 
the ability to immediately find the position of the 8ut)sequence which contains the first 9 positions and can conrpare 
whether there exists more than 1 subsequence during the first 9 positions. In fact, the computer can easily generate all 
of the possible target sequences which contain given combinations of sii>sequences. Typically, there will be only one, 
but in various situations, there will be more. 

IS Generally, such computer programs provide for automated scanning of the sut}strate to determine the positions of 
oligonucleotide and target interaction. Sinple processing of the intensity of the signal may be incorporated to filter out 
clearly spurious signals. The positions with positive irrteraction are con^elated with the sequence specificity of specific 
matrix positions, to generate tiie set of matching subsequences. This information is further correlated witti other target 
sequence infbrmation, e.g., restriction fragment analysis. The sequences are ttien aligned using overlap data, ttiereby 

20 leading to possible corresponding target sequences which will, optimally, correspond to a single target sequence 

W/. Applications 

The enzymatic discrimination enhancement methods provided by the present invention have very broad applications. 

25 Although described specifically for polynucleotide sequences, similar sequencing, fingerprinting, mapping, and screen- 
ing procedures may be applied to polypeptide, carbohydrate, or other polymers. Such methods can be used in all 
instances where improved disaimination between fully complementary hybrids and tiiose that differ by one or more 
base pairs would be helpful. More particularly, such methods can be used with de novo sequencing, or in conjunction 
with a second sequencing procedure to provide independent verification (/.e., resequencing). See, eg.. Science 

30 242:1245 (1988). For example, a large polynucleotide sequence defined by eHher the Maxam and Gilbert technique or 
by the Sanger technique may be verified by using tiie present invention. 

In addition, by selection of appropriate probes, a polynucleotide sequence can be fingerprinted. Rngerprinting Is a 
less detailed sequence analysis which usually involves the characterization or a sequence by a combination of defined 
features. Sequence fingerprinting is particulariy useful t}ecause the repertoire of possible features which can be tested 

35 is virtually infinite. Maeover. the stringency of matching is also variable depending upon the application. A Soutiiem 
Blot analysis may be characterized as a means of simple fingerprint analysis. 

Rngerprinting analysis may be performed to the resolution of specific nucleotides, or may be used to determine 
homologies, most commonly for large segments. In particular, an arrayof oligonucleotide probes of virtually any workat)le 
size may be positionally localized on a matrix and used to probe a sequence for either absolute complementary matching. 

40 or homology to tiie desired level of sta'ingency using selected hybridization conditions. 

In addition, tiie present invention provides means for mapping analysis of a target sequence or sequences. Mapping 
will usually involve the sequential ordering or a plurality of various sequences, or may involve the localization of a par- 
ticular sequence within a plurality of sequences. This may be achieved by immobilizing particular large segments onto 
the matrix and protnng with a shorter sequence to determine which of ttie large sequences contain that smaller sequence. 

45 Alternatively, reliatively shorter probes of known or random sequence may be immobilized to the matrix and a map of 
various different target sequences may be determined from overiaps. Principles of such an approach are described in 
some detail by Evans et al. (1989) "Physical Mapping of Conplex Genomes by Cosmid Multiplex Analysis." Proc Natl, 
Acad Sci. USA 86:5030-5034; Michiels, et al„ "Molecular Approaches to Genome Analysis: A Strategy for ttie Con- 
struction of Ordered Overlap Clone Libraries." CABIOS 3:203-210 (1987); Olsen. et al, "Random-Clone Strategy for 

50 Genomic Restriction Mapping in Yeast," Proc. Natl Acad, Sd, USA 83:7826-7830 (1986); Craig, et al., "Ordering of 
C^mid Clones Covering ttie Herpes Simplex Virus Type I (HSV-I) Genome: A Test Case for Fingerprinting by Hybridi- 
zation," Nua Acids Res, 18:2653-2660 (1990); and Coulson, et al„ Toward a Physical Map of ttie Genome of ttie 
Nematode Caenorhabditis elegans." Proc Natl. Acad. Sci. USA 83:7821-7825 (1986); each of which is hereby incor- 
porated herein by reference. 

55 Rngerprinting analysis also provides a means of identification. In addition to its value in apprehension of criminals 
from whom a biological sample, e.g., blood, has been collected, fingerprinting can ensure personal identification for 
other reasons. For example, it may be useful for identification of txxJies in tragedies such as fire, flood, and vehicle 
crashes. In other cases the Identification may be useful in identif icatton of persons suffering from amnesia, or of missing 
persons. Other fbrensics applications include establishing the identity of a person, e.g., military identification "dog tags", 
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or may be used in identifying the source of particular biological samples. Rngerprinting technology is described, e,g., 
in Carrano, etal., "A High- Resolution. Fluorescence-Based. Semi-automated method for DNA Rngerprinting," Genomics 
4: 120-136 (1989), which is hereby incorporated herein by reference. 

The fingerprinting analysis may be used to peribnn various types of genetic screening. For example, a single sub- 
strate may be generated with a plurality of screening probes, allowing for the simultaneous genetic screening for a large 
number of genetic markers. Thus, prenatal or diagnostic screening can be simplified, economized, arxl made more 
generally accessible. 

In addition to the sequencing, fingerprinting, and mapping applications, the present invention also provide, means 
for determining specificity of interaction with particular sequences. Many of these applications are desaibed in U.S.S.N. 
07^362,901 (VLSIPS parent). U.S.S.N. 07/492,462 (VLSIPS CIP), U.S.S.N. 07/435,316 (caged biotin parent), and 
U.S.S.N. 07/612.671 (caged biotin CIP), which are incorporated herein by reference. 

W//. Libraries of Unimotecular, DoubteStranOed Oligonucleotides 

In one aspect, the present invention provides libraries of unlmolecular double*stranded oligonucfeotides, each mem- 
ber of the library having the formula: 

in which Y represents a solid support. and represent a pair of complementary oligonucleotides. 0 represents a 
bond or a spacer, and represents a linking group having suffident length such that and form a double-stranded 
oligonucleotide. 

The solid support may be biological, nonbiological. organic, inorganic, or a combination of any of tiiese. existing as 
particles, strands, precipitates, gels, sheets, tubing, spheres, containers, capillaries, pads, slices, films, plates, slides, 
etc. TTie solid support is preferably flat but may take on atte^native surface configurations. For example, the solid support 
may contain raised or depressed regions on which synthesis takes place. In some embodiments, the solkJ support will 
be chosen to provide appropriate light-absorbing characteristics. For example, the support may be a polymerized Lang- 
muir Blodgett film, functionalized glass. Si. Ge. GaAs. GaP. Si02, SIN4, modified silicon, or any one of a variety of gels 
or polymers such as (poly)tetrafluoroethylene, (poly)vinyl'dendifiuoride, polystyrene, polycartxinate. or combinations 
thereof. Other suitable solid support materials will be readily apparent to those of skill in the art. Preferably, the surface 
of the solid support will contain reactive groups, which ooutel be carboxyl. amino, hydroxyl. tiiiol, or the like. More pref- 
erably, the siffface will be optically transparent and will have surface Si--OH functionalities, such as are found on silica 
surfaces. 

Attached to the solid support is an optional spacer, U. The spacer molecules are preferably of sufficient lengtti to 
permit the double-stranded oligonucleotides in the completed member of the library to interact freely with molecules 
exposed to the library. The spacer molecules, when present are typically 6-50 atoms long to provkJe sufficient exposure 
for the attached double-stranded DNA molecule. The spacer, , is comprised of a suriisu^e attaching portion and a tonger 
chain portion. The surface attaching portion is that part of U which is directly attached to the solid support This portion 
can be attached to the solid support via cartx)n-cartx)n bonds using, for example, supports having (poly)trifluorochlo- 
roethylene surfaces, or preferably, by siloxane bonds (using, for example, glass or silicon oxide as the solid support). 
Siloxane bonds with the surfece of the support are formed in one embodiment via reactions of surface attaching portions 
bearing trichlorosilyl or trialkoxysityl groups. The surface attaching groups will also have a site for attachment of the 
longer chain portion. For example, groups which are suitable for attachment to a tonger chain portion woukf include 
amines, hydroxyl, thiol, and cartx>xyl. Preferred surface attaching portions include aminoalkylstlanes and hydroxyall^- 
silanes. In particulariy prefenred embodiments, the surface attaching portion of U is either bis(2-hydrQxyethyi)amino- 
propyttriettioxysilane, 2-hydrQxyethylaminopropyltriethQxysilane, aminopropyltriethoxysilane or 

hydroxypropyttriethoxysilane. 

The longer chain portion can be any of a N^iety of molecules which are inert to the subsequent conditions for 
polymer synthesis. These longer chain portions will typically be aryl acetylene, ethylene glycol oligomers containing 2- 
1 4 monomer units, diamines, diacids, amino acids, peptides, or combinations thereof. In some embodiments, the longer 
chain portion is a polynucleotide. The longer chain portion which is to be used as part of U can be selected based upon 
its hydrophilic/hydrophobic properties to improve presentation of the doil)le-stranded oligonucleotides to certain recep- 
tors, proteins or drugs. The longer chain portion of can be constructed of polyethylenegtycols. polynucleotides, 
aikylene, polyaicohol, polyester, polyamine, polyphosphodiester and combinations thereof. Additionally, for use in syn- 
thesis of the libraries of the invention, 0 will typically have a protecting group, attached to a functional group (/le.. 
hydroxyl, amino or cartx>xytic acid) on the distal or terminal end of the chain portion (opposite the solid support). After 
deprotection and coupling, the distal end is covalently bound to an oligomer. 

Attached to the distal end of is an oligonucleotide. X\ which is a single-strarxJed DNA or RNA molecul . The 
oligonucleotides wfiich are part of the present invention are typically of from about 4 to about 100 nucleotides in length. 
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Preferably, is an oligonucleotide which is about 6 to about 30 nucleotides in length. The oligonucleotide is typically 
linked to via the 3'-hydroxyl group of the oligonucleotide and a functional group on 0 which results in the formation 
of an ether, ester, carbamate or phosphate ester linkage. 

Attached to the distal end of is a linking group, L^, which is flexible and of sufficient length that X^ can effectively 

5 hybridize with X^. The length of the linker will typically be a length which is at least the length spanned by two nucleotide 
monomers, and preferably at least four nucleotide monomers, while not be so long as to int^ere with either the pairing 
of X^ and X^ or any subsequent assays. TTie linking group itself will typically be an alkylene group (of from about 6 to 
about 24 cartx>ns in length), a polyethyleneglycol group (of from about 2 to about 24 ethyleneglycol monomers in a linear 
configuration), a polyalcohol group, a polyamine group (e.y., spermine, spermidine and polymeric derivatives thereof), 

10 a polyester group (e.g.. poIy(ethyl acrylate) having of from 3 to 15 ethyl acrylate monomers in a linear configuration), a 
polyphosphodiester group, or a polynucleotide (having from about 2 to about 12 nucleic acids). Preferably, the linking 
group will be a polyethyleneglycol group which is at least a tetraethyleneglycd, and more preferably, from akxHJt 1 to 4 
hexaethylenegtycols linked in a linear array. For use in synthesis of the compounds of the invention, the linking group 
will be provided with functional groups which can be suitably protected or activated. The linking group will be covalentiy 

IS attached to each of the complementary oligonucleotides. X^ and X^. by means of an ether, ester, cart^amate, phosphate 
ester or amine linkage. The flexible linking group will be attached to the 5'-hydroxyl of the terminal nrK>nomer of X^ 
and to the 3*-hydroxyl of the initial monomer of X^. Preferred linkages are phosphate ester linkages which can be formed 
in the same manner as tiie oligonucleotide linkages which are present in X^ and X^. For example, hexaethyleneglycol 
can be protected on one tenninus with a photolabile protecting group (Le., NVOC or MeNPOC) and activated on tiie 

20 other terminus with 2*cyanoethyl-N.N-diisopropylamino-chlorophosphite to form a phosphoramidite. This linking group 
can ttien be used for construction of the libraries in the same manner as the photolabileimtected, phosphoramkiite- 
activated nucleotides. Alternatively, ester linkages to X** and X^ can be famed when the has terminal cartxiocylrc ackJ 
moieties (using the 5 -hydroxyl of X'' and the 3*-hydrQxyl of X^). Other metiiods of forming ether. cart)amate or amine 
linkages are known to those of skill in the art and particular reagents and references can be found in such texts as March. 

25 Advanced Organic Chemistry, 4th Ed., Wiley-lntersdence, New YorK NY. 1992, incorporated herein by reference. 

The oligonucleotide, )(^, which is covalentiy attached to the distal end of the linking group is. like X\ a single-stranded 
DNA or RN A molecule. The oligonucleotides which are part of the present invention are typically of from about 4 to about 
100 nucleotides in length. Preferably, X^ is an oligonucleotide which is about 6 to about 30 nucleotides in length and 
exhibits complementarity to X^ of from 90 to 100%. More preferak)ly, X^ and X^ are 100% complementary. In one group 

30 of embodiments, eittier X^ or X^ will further comprise a bulge or loop portion and exhibit complementarity of from 90 to 
100% over the remainder of ttie oligonucleotide. 

In a particularly prefenred embodiment, the solid sipport is a silica support, tiie spacer is a polyethyleneglycol con- 
jugated to an aminoalkylsilane. the linking group is a polyetiiyleneglycol group, and X^ and X^ are complementary oli- 
gonucleotides each comprising of from 6 to 30 nucleic acid monomers. 

3S The library can have virtually any number of different members, and will be limited only by the number or variety of 
compounds desired to he saeened in a given application and by ttie synthetic capabilities of the practitioner. In one 
group of embodiments, the library will have from 2 up to 100 members. In other groups of embodiments, the library will 
have between 100 and 10,000 members, and between 10.000 and 1,000.000 members, preferably on a solki support. 
In preferred embodiments, the library will have a density of more than 100 meni^ers at known locations per cm^. pref- 

40 erably more ttian 1 ,000 per cm^, more preferably more than 1 0.000 per cm^. 

Preparation of these libraries can typically be carried out using any of tiie methods described above for the prepa- 
ration of oligonudeotkJes on a solid support (e.^., light-directed methods, flow channel or spotting methods). 

/X Ubraries of Confomationaify Restricted Probes 

45 

In still anotiier aspect, the present invention provides libraries of confbrmationally-restricted probes. Each of the 
members of the library comprises a solid support having an optional spacer which is attached to an oligomer of tiie 
formula: 

so ^11^-^12 

in which X^^ and X^^ are conplementary oligonucleotides and Z is a proba The probe will have sufficient length such 
that X'' ^ and X^^ form a dout)le-stranded DNA portion of each member. X^ ^ and X^^ are as described akxve for X^ and 
X^ respectively, except that for the present aspect of tiie invention, each member of the probe library can have the same 
55 X^^ and tiie same X^^. and differ only in the probe portion. Inonegroupof emtxxiiments, X**^ and X^^ are either a poly- 
A oligonucleotide or a poly-T oligonucleotide. 

As noted above, each member of the library will typically have a different probe portion. The probes. Z, can be any 
of a variety of structures for which receptor-prol)e binding information is sought for conformationally-restricted forms. 
For example, the probe can be a agonist or antagonist for a cell membrane receptor, a toxin, venom, viral epitope. 



27 



EP0721 016 A2 

hormone, peptide, enzyme, cofactor, drug, protein or antibody. In one groif) of embodiments, the probes are different 
peptides, each having of from about 4 to about 12 amino adds. Pref^-abiy the probes will be linked via polyphosphate 
diesters. although other linkages are also suitable. For example, the last monomer employed on the X^^ chain can be 
a 5*-aminopropyl-funcfa'onatized phosphoramidite nucleotide (available from Glen Research. Sterling. Virginia. USA or 
Genosys Biotechnologies. The Woodlands, Texas. USA) which will provide a synthesis initiation site for the carboxy to 
amino synthesis of the peptide probe. Once the peptkie probe is formed, a 3 -succinylated nucleoside (from Cruachem. 
Sterling, Virginia, USA) will be added under peptide coupling conditions. In yet another group of embodiments, the 
probes will be oligonucleotides of from 4 to about 30 nucleic ackj monomers which will form a DNA or RNA hairpin 
structura For use in synthesis, the probes can also have associated functional groups (/' e.. hydroxyl, amino. cartx)xylic 
acKl, anhydrkle and derivatives thereof) for attaching two positions on the probe to each of tiie conrplementary oligonu- 
cleotides. 

The surface of tiie solid support is preferably provkied with a spacer molecule, although it will be understood that 
the spacer molecules are not elements of this aspect of the invention. Where present the spacer molecules will be as 
described above for 0. 

The libraries of conformationally restricted probes can also have virtually any numt>er of members. As above, the 
number of members will be limited only by design of the particular saeening assay for which the library wilt be used, 
and by tiie syntiietic capabilities of the practitioner. In one group of embodiments, the library will have from 2 to 100 
members. In other groups of embodiments, the library will have between 100 and 10,000 members, and between 1 0,000 
and 1.000.000 members. Also as above, in preferred embodiments, the library will have a density of more than 100 
members at known locations per cm^, preferably more than 1000 per cm^, more preferably more than 10.000 per cm^. 

Preparation of these libraries can typically be carried out using any of the methods described above for the prepa- 
ration of oligonucleotides on a soiki support (e.g.. light-directed methods, flow channel or spotting methods). 

X Libraries of Intermolecular, Doubly-Anchored, Double-Stranded Oligonucleotides 

In another aspect, the present invention provkles libraries of intermolecular. doubly-anchored, double-stranded oli- 
gonucleotides, each member of the library having the formula: 




In this formula, Y represents a sdkl support, and X^ represent a pair of complementary oligonucleotides, and 
and each represent a bond or a spacer. Typically, 1} and are the same and are spacers having sufficient length 
such that X^ and X^ can fbmi a double-stranded oligonudeotkJe. The non-covalent binding which exists between X^ 
and X^ is represented by the dashed line. 

The soIki support can be any of the solkl supports desaibed herein for other aspects of the invention. Attached to 
the soikJ support are spacers, and L^. These spacers are the same as those desaibed above for the unimolecular. 
double-stranded oligonucleotide embodiments. Preferably, the spacers are oorrprised of a surftice attaching portbn, 
which is a hydroxyalkyltriethoxysilane or an aminoalkyltriethocysilane, and a longer chain portion which is derived from 
a poly{ethylene glycol). 

Attached to tiie distal ends of 0 and are X^ and X^, respectively X^ and X^ are each a single-stranded DNA or 
RNA molecule. The oligonucleotides which are part of tiie present invention are typically of from about 4 to about 100 
nudeotkJes in length. Preferably, X^ and X^ are each an oligonucleotide of about 6 to about 30 nudeotkJes in length. 
The oligonucleotides are typically linked to or via the 3'-hydroxy1 group of the oligonucleotide and a functional group 
on L"* which results in the formation of an ether, ester, cartjamate or phosphate ester linkage. 

In one group of preferred embodiments, X^ and X^ are complementary oligonucleotides of atxxjt 6 to about 30 
nucleotides in length, and exhibit complementarity of from 90 to 100% over tiieir entire length. Anrays. or libraries of 
these double-stranded oligonucleotides can be used to saeen samples of DNA. RNA. proteins or drugs for their 
sequence-specific interactions. 

In another group of preferred embodiments, the SMerminal region of X^ (the distal portion with reference to tiie solkl 
support) will be compi ementary to tiie SMerminal region of X^ (the distal portion . again with reference to the solkj support). 
For example, X^ and X^ can each be an oligonudeotkle of from akx)ut 10 to atxnit 30 nudeotides in length. The 5* end 
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of will comprise of from about 4 to about 20 nucleotides which will be complementary to the 5* end of (see FIG. 
10E}. As above, the degree of complementarity will typically be from about 90 to about 100%, preferak)ly about 100%. 
Arrays, or libraries of this group of emtxxiiments can be used for the hybridization and ligation of additional oligonucle- 
otide. With reference to FIGS. 1 0E and 1 0F, libraries of oligonucleotides which are complementary in overlapping regions 

5 of their 5' ends can be prepared (see FIG. 1 0E). then incut>ated with additional oligonucleotides which are complementary 
to the 3* ends of the surface-bound oligonucleotides. After hybridization, a continuous helix is formed with a length 
equivalent to the combination of the hybridized added oligonucleotides and the complementary portion of the surface- 
bound oligonucleotides. Additionally, each strand will contain a nick between the added oligonucleotide and the surface- 
bound oligonucleotide. In preferred embodiments, the surface-bound oligonucleotides are phosphorylated (chemically 

TO or enzymically with a kinase) such that the nick can be closed with a T4 DNA ligase to produce a contiguous intermo- 
lecular. doubly-anchored, double-stranded oligonucleotide which is longer than either of the initially formed X'' or X^ 
oligonucleotides. 

Another application for this aspect of the invention is hybridization enhancement. This is illustrated in FIG. 10G. As 
can be seen in FIG. 10G, a library of intermolecular, doubly-anchored, double-stranded oligonucleotides is prepared as 

IS desalbed above and as illustrated in FIG. 10E. Target oligonucleotides, having unknown sequences at their 3' termini 
incubated with the library. Hybridization of the 3' end of the target oligonucleotide to the complementary portion of a 
library memk>er is enhanced by the cooperative nature of formation of the extended DNA duplex. Addittonally, the hytyid- 
ization step can be followed by a ligation step (when the ends of the surface-bound oligonucleotides are ph(^phorylated) 
to further enhance the discrimination of any 3' mismatches. 

20 The libraries of tills aspect of the invention can also have virtually any number of different members, and will be 
limited only by ttie number or variety of compounds desired to he screened in a given application and by the synthetic 
capabilities of tiie practitioner. In one group of embodiments, tfie library will have from 2 up to 100 members. In ottier 
groups of embodiments, the library will have between 100 and 10,000 members, and between 10.000 and 1.000.000 
members, preferably on a solid support. In preferred emt>odiments. the library will have a density of more than 100 

25 memt)ers at known locations per cm^, preferably more than 1 .000 per cm^, more preferat3}y more than 1 0.000 per cm^. 
Preparation of these libraries can typically be candied out using any of the methods described above for the prepa- 
ration of oligonucleotides on a solid support (eg., light-directed mettiods. fk>w channel or spotting metixxls). Typk»lty. 
the oligonucleotides X^ arxl X^ will be synthesized as a pair in each cell of the library. Such synthesis generally requires 
that synthesis initiation sites be prepared having two different and independentiy removable protecting groups. For exam- 

30 pie, a solid support (eg., a glass coversllp) can be modified with a suitable linking group (eg., hydroxypropyttriethoxysi- 
iane. or the mono trietiioxystlylpropyl ether of a polyethylene glycol having an appropriate lengtti). The surface hydroxyl 
groups which are present following the attachment of the linking groups can he uniformly protected with MeNPOC-CI. 
Controlled irradiation can he used to deprotect akx3ut half of the hydroxyl groups, which are subsequentiy protected as 
DMTor MMT (mono-metiioxy trrtyl) ethers. In this manner, each cell or portion of tiie solid support will have approximately 

35 equivalent numbers of two linking groups being Independentiy removable protecting groups. Synthesis of the library can 
then proceed in a straightfbnward manner by removing ttie MeNPOC groups (by irradiation) in one cell and constructing 
oligonucleotide X^ , ttien removing the DMTor MMT group in tiie same cell and constructing oligonucleotide X^. Synthesis 
in each of the cells or regions can proceed in a similar manner to produce the libraries of this aspect of the invention, 
in ttiis manner, using two rounds of synthesis following ttie Initial steps to divide the available sites into independentiy 

40 protected sites, it is possible to prepare arrays, or libraries of regions containing pair of complementary digonudeotides 
of any sequence. 

XI. Methods of Screening Libraries of Double-Stranded Oligonucleotides and Probes 

45 A library prepared according to any of the mettiods described above can be used to screen for receptors having 
high affinity for unimolecular, double-stranded oligonucleotides, intermolecular, doubly-anchored, double-stranded oli- 
gonucleotides or confbrmationally restricted probes. In one group of enixxiiments, a solution containing a marked 
(labelled) receptor is introduced to tiie library and incUbated for a suitable period of time. The library is then washed free 
of unbound receptor and the prot>es or doubl e-stranded oligonucleotides having high affinity for th e receptor are identified 

50 by kientifying ttiose regions on the surface of the library where mariners are located. Suitable markers Include, but are 
not limited to, radidabels. chromophores. fluorophores. chemiluminescent moieties and transition metals. Alternatively, 
the presence of receptors may be detected using a variety of ottier techniques, such as an assay wiUi a labelled enzyme, 
antibody, and the like. Ottier techniques using various marker systems for detecting bound receptor will be readily appar- 
ent to ttiose skilled in ttie art. 

55 In a preferred embodiment, a library prepared on a single solid support (using, for example, the VLSIPS™ technique) 
can be exposed to a solution containing marked receptor such as a marked antibody. The receptor can be marked in 
any of a variety of ways, but in one embodiment marking is effected with a radioactive label. The nwked antibody binds 
with high aff inity to an immobilized antigen previously localized on the surface. After washing the surface free of unbound 
recepta, the surface is placed proximate to x-ray film or phosphorimagers to identify the antigens that are recognized 
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by the antibody. Alternatively, a fluorescent marker may be provided and detection may be by way of a charge-coupled 
device (CCD), fluorescence miaoscopy or laser scanning. 

When autoracfiography is the detection method used, the marker is a radioactive label, such as ^^P. The marker on 
the surfece is exposed to X-ray film or a phosphorimager, which is developed and read out on a scanner. An exposure 

5 time of about 1 hour is typical in one embodiment Ruorescence detection using a f luorophore label, such as fluorescein, 
attached to the receptor will usually require shorter exposure times. 

Quantitative assays for receptor concentrations can also be performed according to the present invention. In k direct 
assay method, the surface containing localized probes prepared as described above, is incubated with a solution con- 
taining a marked receptor for a suitable period of time. The surfece is then washed free of unbound receptor. The amount 

10 of marker present at predefined regions of the surface is then measured and can be related to the amount of receptor 
in solution. Methods and conditions tor performing such assays are well-known and are presented in, for example. L 
Hood et al.. Immunology, Benjamin/Cummings (1978). and E. Hark)w et aA, Antibodies. A Laboratory Manual, CokJ 
Spring Hart^or Laboratory. (1988). See afsa U.S. Pat. No. 4.376,1 10 for methods of performing sarxJwich assays. The 
precise conditions for performing these steps will be apparent to one skilled in the art 

IS A competitive assay method for two receptors can also be employed using the present invention. Methods of con- 
ducting competitive assays are known to those of skill in the art One such method involves immobilizing conformationally 
restricted probes on predefined regtons of a surface as described above. An unmarked first receptor is then bound to 
the probes on the surface having a known specific binding affinity for the receptors. A solution containing a marked 
second receptor is then introduced to the surface and incubated for a suitable time. The surface is then washed free of 

20 unbound reagents and the amount of marker remaining on the surface is measured. In another form of competition 
assay, marked and unmarked receptors can be exposed to the surface simultaneously. The amount of marker remaining 
on predefined regions of the surfece can be related to the amount of unknown receptor in solution. Yet another fomi of 
competition assay will utilize two receptors having different labels, for example, two different chromophores. 

In other en&odiments, in order to detect receptor binding, the dout)le-stranded oligonudeo^es which are formed 

25 with attached probes or with a flexible linking group will be treated with an intercalating dye, preferably a fluorescent 
dye. The library can be scanned to estak^lish a background fluorescence. After exposure of the library to a receptor 
solution, the exposed library will be scanned or illuminated and examined for those areas in which fluorescence has 
changed. Alternatively, the receptor of interest can be labeled with a fluorescent dye by methods known to those of skill 
in the art and incubated with the library of probes. The library can then be scanned or Illuminated, as above, and examined 

30 for areas of fluorescence. 

In instances where the libraries are synthesized on beads in a number of containers, the beads are exposed to a 
receptor of interest In a preferred embodiment the receptor is fluorescently or radioactively labelled. Thereafter, one or 
more t>eads are identified that exhibit significant levels of. for example, fluorescence using one of a variety of techniques. 
For example, in one embodiment, mechanical separation under a microscope is utilized. The identity of the rTK>lecule 

35 on the surface of such separated beads is then identified using, for example. NMR, mass spectrometry. PCR amplification 
and sequencing of the associated DNA. or the like. In another embodiment, automated sorting (/.e.. fluorescence acti- 
vated cell sorting) can be used to separate beads (bearing probes) which bind to receptors from those which do not 
bind. Typically the beads will be labeled and identified by methods disclosed in Needels. ef a/.. Proa NatL Acad Sci„ 
USA 90:10700-10704 (1993), incorporated herein by reference. 

40 The assay methods described akxsve for the libraries of the present invention will have tremendoi^ application in 
such endeavors as DNA lootprinting" of proteins which bind DNA. Currently. DNA fdotprinting is conducted using DNase 
I digestion of double-stranded DNA in the presence of a putative DNA binding protein. Gel analysis of cut and protected 
Df^ fragments then provides a 'footprint" of where the protein contacts the DNA. This method is both labor and time 
intensive. See, Galas ef a/.. Nucleic Acid Res. 5:3157 (1978). Using the atx>ve methods, a "footprint" could be produced 

45 using a single anay of unimdecular. double-stranded oligonucleotides in a fraction of the time of conventional methods. 
Typically, the protein will be labeled with a radioactive or fluorescent species and incubated with a Ibrary of unimolecular, 
double-stranded DNA. Phosphorimaging or fluorescence detection will provide a fbolprint of those regions on the library 
where the protein has bound. Alternatively, unlabeled protein can be used. When unlat>eled protein is used, tfie double- 
stranded oligonucleotides in the library will all be labeled with a marker, typically a fluorescent marker. Incorporation of 

50 a marker into each member of the library can be carried out by terminating the oligonucleotide synthesis with a com- 
mercially available fluoresdng phosphoramidite nucleotide derivative. Following incubation with the unlabeled protein, 
the library will be treated witii DNase I and examined for areas which are protected from cleavage. 

The assay methods described above for the lOsraries of tiie present invention can also be used in reverse drug 
discovery. In such an application, a compound having known pharmacological safety or other desired properties {e,g„ 

55 aspirin) could be screened against a variety of double-stranded oligonucleotides for potential binding. If the compound 
is shown to bind to a sequence associated with, for example, tumor suppressk)n. tiie compound can be further examined 
for efficacy in the related diseases. 

In other embodiments, probe anays comprising p-turn mimetics can be prepared and assayed for activity against 
a particular receptor, p-turn mimetics are compounds having molecular structures simileu' to p-turns which are one of 
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the three major components in protein molecular architecture, p-tums are similar in concept to hairpin turns of otigonu- 
deotide strands, and are often critical recognition features for various protein-ligand and protein-protein interactions. As 
a result, a library of p-turn mimetic probes can provide or suggest new therapeutic agents having a particular affinity for 
a receptor which will conrespond to the affinity exhibited by the p-tum and its receptor. 

XIL Bloelectronic Devices and Methods 

In anotiier aspect, the present invention provides a method for the bloelectronic detection of sequence-specific 
oligonucleotide hybridization. A general method and device which is useful in diagnostics in which a biochemical species 
is attached to the surface of a sensor is described in U.S. Patent No. 4,562,157 (the Lowe patent), incorporated herein 
by reference. The present method utilizes arrays of immobilized oligonudeotkies (prepared, for exanrple, using VLSIPStm 
technology) and the known photonnduced electron transfer which is mediated by a DNA double helix structure. See. 
Murphy, etal,, Sc/er?ce 262:1025-1029 (1993). This method is useful in hybridization-based diagnostics, as a replace- 
ment for fluorescence-based detection systems. TYie metiiod of bloelectronic detection also offers higher resolution and 
potentially higher sensitivity than earlier diagnostic methods involving sequencing/detecting by hybridization. As a result, 
this method finds applications in genetic mutation screening arxJ primary sequencing of oligonucleotides. The method 
can also be used for Sequencing By Hybridization (SBH), which is described in co-pending Application Ser. Nos. 
08/082,937 (filed June 25. 1993) and 08/168.904 (filed December 15. 1993), each of which are incorporated herein by 
reference for all purposes. This method uses a set of short oligonucleotide probes of defined sequence to search for 
complementary sequences on a longer target strand of DNA. The hybridization pattern is used to reconstruct tiie target 
DNA sequence. Thus, the hybridization analysis of large numbers of probes can he used to sequence long stretches of 
DNA. In imnrtediate applications of tills hybridization methodology, a small number of probes can be used to Interrogate 
local DNA sequence. 

In the present inventive method, hyt)ridization is monitored using bioelectronic detection. In this mettiod. ttie target 
DNA, or first oligonucleotide, is provided witti an electron-donor tag and then incubated witti an anay of oligonucleotide 
probes, each of which bears an electron-acceptor tag and occupies a known position on tiie surfece of the array. After 
hybridization of the first oligonucleotide to the array has oocunred. the hybridized anray is illuminated to induce an electron 
transfer reaction in tiie direction of the surface of the array. The electron transfer reaction is then detected at the location 
on the suriace where hybridization has taken place. Typically, each of the oligonucleotide probes in an array will have 
an attached electron-acceptor tag located near the surface of tiie solid support used in preparation of the array. In 
embodiments in which the arrays are prepared by light-directed methods (/le. typically 3* to 5* direction), the electron- 
acceptor tag will be located near the 3' position. The electron-acceptor lag can be attached either to the 3' monomer by 
methods known to tiiose of skill in the art, or it can be attached to a spacing group t>etween the 3* monomer and the 
solid support. Such a spacing group will have, in addition to functional groups for attachment to tiie solid support and 
the oligonucleotide, a third functional group for attachment of the electron-acceptor tag. The target oligonucleotide will 
typically have the electron-donor tag attached at the 3' position. Alternatively, the target oligonucleotide can be incubated 
with the array in the absence of an electron-donor tag. Folk>wing incut>ation. the electron-donor tag can be added in 
solution. The electron-donor tag will ttien intercalate into those regions where hybridization has occurred. An electron 
transfer reaction can then be detected in those regions having a continuous DNA double helix. 

The electron-donor tag can be any of a variety of complexes which participate in electron transfer reactions and 
which can be attached to an oligonucleotide by a means which does not interfere witti the electron transfer reaction. In 
preferred enixxliments. the electron-donor tag is a ruttienium (II) complex. nrx>re preferably a rutiienium (11) 
(phen')2(dppz) conplex. 

The electron-acceptor tag can be any species which, with ttie electron-donor tag, will participate in an electron 
transfer reaction. An example of an electron-acceptor tag is a rhodium (111) complex. A preferred electron-aoceptor tag 
is a rhodium (III) (phi)2(phen') complex. 

In a particularly preferred embodiment, the electron-donor tag is a ruttienium (II) (phen*)2(dppz) complex and the 
electron-acceptor tag is a riiodium (III) (phi)2(phen') complex. 

In still anottier aspect, ttie present invention provkjes a devk:e for the bioelectronic detection of sequence-specific 
oligonucleotide hybridization. The devtee will typically consist of a sensor having a surface to which an array of oligonu- 
cleotides are attached. The oligonucleotides will be attached in pre-defined areas on the surface of the sensor and have 
an elecfron-acceptor tag attached to each oligonucleotide. The electron-acceptor tag will be a tag which is capable of 
producing an electron transfer signal upon illumination of a hybridized species, when ttie complementary oligonucleotide 
bears an electron-donating tag. The signal will be in the direction of ttie sensor surface and be detected by the sensor. 

In a preferred embodiment the sensor surface will t>e a silicon-based surface which can sense the electronic signal 
induced and. if necessary, amplify the signal. The metal contacts on which the probes will he synthesized can be treated 
witti an oxygen plasma prior to synthesis of the probes to enhance the silane adhesion and concentration on the surface. 
The suriace will further comprise a multi-gated field effect transistor, with each gate serving as a sensor and different 
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oligonucleotides attached to each gate. The oligonucleotides will typically be attached to the metal contacts on the sensor 
surface by means of a spacer group. 

The spacer group should not be too long, in order to ensure that the sensing function of the device is easily activated 
by the b'ncfing Interaction and subsequent illumination of the "lagged" hybridized oligonucleotides. Preferably, the spacer 
group is from 3 to 12 atoms in lengtti and will be as described above for the surface modifying portion of the spacer 
group, 0, 

The oligonucleotides which are attached to the spacer group can be formed by any of the solid phase techniques 
which are known to those of skill in the art. Preferably, the oligonucleotides are formed one base at a time in the direction 
of the 3' terminus to the 5* terminus by the "light-directed" methods described above. The oligonucleotide can then be 
modified at the 3' end to attach the electron-acceptor tag. A number of suitable methods of attachment are known. For 
example, modification with the reagent Aminolink2 (from Applied Biosystems. Inc.) provides a terminal phosphate moiety 
which is derivatized with an aminohexyl phosphate ester. Coupling of a cartxsxylic acid, which is present on the electron- 
acceptor tag. to the amine can then be carried out using HOBT and DCC. Alternatively, synthesis of the oligonucleotide 
can begin with a suitafc)ly derivatized and protected monomer which can then he deprotected and coupled to the electron- 
acceptor tag once the complete oligonucleotide has been synthesized. 

The silica surface can also be replaced by silicon nitride or oxynitride, or by an oxide of another metal, especially 
aluminum, titanium (IV) or iron (III). The surface can also be any other film, membrane, insulator or semiconductor 
overlying the sensor which will not interfere with the detection of electron transfer detection and to which an oligonucle- 
otide can be coupled. 

Additionalty. detection devices other than an FET can be used. For example, sensors such as bipolar transistors. 
MOS transistors and the like are also useful for the detection of electron transfer signals. 

XIIL Alternative Embodiments , . 

A. Adheshres 

In still another aspect, the present invention provides an adhesive comprising a pair of surfaces, each having a 
plurality of attached oligonucleotides, wherein the single-stranded oligonucleotides on one surface are complementary 
to the single-stranded oligonucleotides on the other surface. The strength and position/orientation specificity can be 
controlled using a number of factors inducGng the number and length of oligonucleotides on each surface, the degree 
of complementarity, and the spatial arrangement of complementary oligonucleotides on the surface. For example, 
increasing the number and length of the oligonucleotides on each surface will provide a stronger adhesive. Suitable 
lengths of oligonucleotides are typically from about 10 to about 70 nucleotides. Additionally, the surfaces of oligonucle- 
otides can be prepared such that adhesion occurs in an extremely position-specific manner by a suitable arrangement 
of complementary oligonucleotides in a specific pattern. Small deviations from the optimum spatial arrangement are 
energetically unfavorable as many hybridization bonds must be broken and are not reformed in any other relative orien- 
tation. 

The adheslves of tiie present invention will find use In numerous applications. Generally, the adhesives are useful 
for adhering two surfaces to one another. More specifically, ttie adhesives will find application where biological compat- 
ibility of the adhesive is desired. An example of a biological application involves use in surgical procedures where tissues 
must be held in fixed positions during or following the procedure. In this application, the surfaces of the adhesive will 
typically be membranes which are compatible with the tissues to which they are attached. 

A particular advantage of the adhesives of the present invention is that when they are formed in an orientation 
specific manner, the adhesive portions will be "self-finding," that is the system will go to the thermodynamic equilibrium 
in which the two sides are matched in the predetermined, orientation specific manner. 

0. Mettitxto For Preparing Single-Stranded Nucleic Acid Sequences 

In a further embodiment, the present invention provides a method of using a chip, /.e., an array, of oligonucleotides 
to direct the synthesis of long, single-stranded nucleic acid sequences. More particularly, the present invention provides 
a method of directing the syntiiesis of a single-stranded nucleic acid sequence, the method comprising: (a) forming a 
hybrid oonplex by combining at least two oligonucleotides which are phosphorylated at their 5' ends with a chip-bound 
oligonucleotide, the chlp-tx)und oligonucleotide having subsequences which are complementary to a subsequence of 
each of ttie oligonucleotides; (b) contacting the hybrid complex with a ligase to form a ligated oligonucleotide; and (c) 
releasing ttie ligated oligonucleotide from the chip4x)und oligonucleotide to form a single-stranded nudeic acid 
sequence. 

The foregoing method is illustrated in FIG. 1 7A. As shown in FIG. 17A, the joining of Oligo 1 (Oi) and Oligo 2 (O2) 
is directed by a chip-bound ollgonudeotide having sut)sequences which are complementary to the ends of and O2. 
The digonudeotides, e.^., and 02, are typically greater than 20 nudeotides in length and they are phosphorylated 
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at their 5* ends. Any enzyme that catalyzes the fbrmation of a phosphodiester bond at the site of a single-strand break 
in duplex DNA can be used in this method of the present invention. Such ligases include, but are not limited to, T4 DNA 
ligase. ligases isolated from E co// and ligases isolated from other bacteriophages. In a presently preferred embodiment, 
T4 DNA ilgase is the ligase used. The concentration of the ligase will vary depertding on the particular ligase used, the 

5 concentration of oligonucleotides and buffer conditions, but will typically range from about 500 units/ml to about 5,000 
units/ml. Moreover, the time in which the hybrid complex is in contact with the ligase will vary. Typically, the ligase treat- 
ment is carried out for a period of time ranging from minutes to hundrecte of hours. 

It will be readily apparent to those of skill in the art that using the method of the present invention, nuiltiple oligonu* 
deotides, e.g., Oligos O1-O4. can be joined together by a series of ligation reactions directed by the chip-bound oligo- 

10 nucleotides (See, e.^., FIG. 17B). After each ligation step, the temperature needs to be raised and/or the salt 
concentration reduced to allow the ligated oligonucleotide to be released from the surface. Many cycles of hybridization, 
ligation and heating will be necessary for complete synthesis. However, only a small amount of the full-length product 
needs to be ^nthesized as it can be amplified using PGR subsequent to the ligation steps. 

Moreover, it will be readily apparent to those of skill In the art that the chip can consist of a wkie variety of oligonu- 

15 deotides that would allow a large number of different single-stranded nudeic add sequences to be constructed. The 
chip can have virtually any number of different oligonucleotides, and will be limited only by tiie number or variety of 
single-stranded nudeic add sequences desired and by the synthetic capabilities of the practitioner. In one group of 
embodiments, the chip will have from 1 up to 100 members. In other groups of embodiments, the chip will have between 
100 and 1,0000 members, and between 10,000 and 1000000 members. In preferred embodiments, the chip will have 

20 a density of more tiian 1 00 members at known k)cations per cm?, preferably more than 1 ,000 per cm^, more preferably 
more than 10.000 per cm^. 

In addition to the fbregoing. site-directed "mutanf sequences can be made by using "mutated" Oj oligonudeotides. 
If the mutation is at an internal position of O,, the same chip-bound oligonucleotides are appropriate for the ligation steps. 
If, however, the mutation is near a junction, different chip-bound oligonucleotides will be required. The chip can consist 

25 of a wide variety of oligonudeotides that would allow a large number of different sequences to be constructed. Moreover, 
shuffled genes (Qj in a different order) can also be made using a different chip tiiat encodes for a different set of junctions. 
In addition, a family of mutant genes can be made by using pools of digonucleotides in solution and a chip that contains 
templates for all possible, con-ectiy ordered junctions. 

In another emtxxJiment, the oligonudeotides, i.e., Oj. can be syntiiesized on a chip and selectively released Into 

30 solution. This emtxxjiment can be carried out using a photo-labile linker (See, FIG. 1 7C). Any gene or mutant gene can 
be synthesized by selectively releasing tiie desired oligonudeotides into solution prior to tiie series of ligation reactions. 
This would provide an inaedibly diverse mutant-generation capadty. with tiie specific synUietic product(s) detemrtined 
by tiie irradiation steps used to release the specific set of oligos (and the junctions encoded by the chip). A mutant 
sequence or, altematively, a family of nrujtant sequences could be simply selected by the choice of photolysis steps that 

35 produce the desired reactant oligos. In this emtxxJiment it is best if the photolysis wavelength of the photolabile linker 
is different from the wavelengtii used to remove the MENPOC group during synthesis. Moreover, the photolysis wave- 
length must also be compatible with phosphoramtdite synthesis steps. Such photolabile linkers indude, but are not 
limited to. ortho-nltrobenzyl groups and derivatives tiiereof. 

40 XIV. Examples 

The following examples are provided to illustrate the efffoacy of the inventions herein. 
A ENHANCED DISCRIMINATION USING RNase A 

45 

This example illustrates the ability of RNase A to recognize and cut single-stranded RNA, including RNAin DNA:RNA 
hybrids that Is not In a perfect double-stranded structure. RNA bulges, loops, and even single base mismatches can, for 
example, be recognized and deaved by RNase A. RNase A treatment is used herein to improve tiie quality of RNA 
hybridization signals on high density oligonucleotide arrays. 

50 

The high density array of oligonucleotide probes on a glass substrate (referred to as a "chip") is prepared using tiie 
standard VLSIPS protocols set forth above. Moreover, the pattern of oligonucleotide probes is based on the starxjard 
55 tiling strategy described shown in Rg. 5. Briefly, tiie chip used in tills example consists of an overlapping set of DNA 1 5- 
mers covalently linked to a glass surface. A set of four probes for each nucleotide of a 1 .3 kb region spanning the D- 
loop region of human mitochondrial DNA (mtDNA) is present on the substrate. Each of thefour prok)es contains a different 
k}ase (A. C. G or T) at the position being interrogated, with the substitution position being near the center of the proba 
Because tiie probes are specifically selected based on the mtDNA target sequence, one of tiie four probes will be 
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perfectly complementary to the mtDNA target, and the other three will contain a central baseisairing mismatch. The 
mismatch probes are expected to hybridize to a lesser extent By Incorporating f luaophores irrto the target DNA or RNA, 
the extent of hybridization at the four positions for each base can be quantitated using fluorescence imaging. In principle, 
the correct target base is simply identtfied as the complement to the probe base giving rise to the largest hybridization 
5 signal. 

Generally, a "base identification" is considered to be made if the signal in one of the four probe regicns is greater 
than twice as large as the signal in a neart>y region that contains no oligonucleotide probes (referred to herein as the 
tsackground**), and if the signal is at least 1.2 times as large as in the other three related probe regions on the chip. If 
the signal in more than one of the probe regions is larger than twice the background, but is not greater than the other 

10 three by at least a factor of 12. then a "multiple-base ambiguity" is indicated. For example, if the Tcontaining and the 
C-containing probes have high but similar hybridization signals, a two-base ambiguity would result (a call of either the 
complementary bases A or G could be made). Ail two-base ambiguities are possSile, as well as all 3- and 4-base an^i- 
guities. If the most intense hybridization signal (largest by at least a factor of 1 .2) is in the region that is not complementary 
to the target sequence, then an "incorrect call" Is made (refen'ed to herein as a "miscall"). As shown below, the RNase 

IS A treatment resolves multiple-base ambiguities and reduces the number of miscalls that result from hybndization of a 
1 .3 kb RNA target to the mitochondrial probe chip described above. 

Labelled mitochondrial RNA samples are prepared using standard PGR and in vitro transaiption procedures. The 
1 .3 RNA sanple is labelled by incorporation of f luorescein-labelled UTP during transaiption (approximately 1 0% of 
Us in the RNA sample are labelled). The RNA (approximately 200 nM concentration of 1.3 kb transcripts) is partially 

'so fragmented by heating to 99.9'*C for 60 minutes in 6 mM magnesium chloride, pH 8. This procedure produces a wide 
range of fragment lengtiis. with an average length of approximately 200 nucleotides. After fragmentation, the RNA sample 
is diluted to 10 nM in 60 mM sodium phosphate. 0.9 M NaQ. 6 mM EDTA. 0.05% Triton X-100, pH 7.9 (refen'ed to as 
6XSSPE*T). For hybridization. 10 mM CTAB (cetyltrimethylammoniurri bromide) is added. The RNA sample is hybridized 
to the chip in a 1 ml flow cell at 22<'C for 40 minutes with stim'ng provided tiy bubbling nitrogen gas through ttie flow cell. 

25 Following hybridization, the chip is rinsed wKh 6XSSPE*T and the fluorescence signal is detected using a scanning 
confbcal fluorescence microscope ("reading" the chip) (See. FIG. 6). The image is stored for later analysis. The chip is 
then treated witti 75 (J of 0.2 (tg/ml RNase A in 6XSSPE-T at 22''C for intervals of 10. 45. and 75 minutes. After each 
intenal, tiie chip is rinsed with 6XSSPE-T arxl the fluorescence signal is read (See, FIG. 7). The results are analyzed 
to determine tiie number of correct base calls, multiple-base ambiguities and miscalls, and the improvement resulting 

30 from the RNase A treatments. 

After the original hybridization. 61 9 out of 1 302 bases were called correctiy (approximately 47%). Of ttie remaining, 
there were 218 miscalls. 458 multiple-base ambiguities, and 1 7 instances where the signal was not more than twice the 
background. (These nuntsers are subject to the conditions of the expo-iment) In particular, they are a function of hybrid- 
ization time and temperature, salt concentration, the presence of Triton X-100 and CTAB, and the extent of RNA frag- 

35 mentation and labelling. The conditions used here, in particular the limited fragmentation of the RNA. are ones ttiat tend 
to deaease the number of regk)ns witii tow signal, and to increase the numk)er of miscalls and ambiguities.) Following 
treatment with RNase A (and combining the information for the three time points), 1 62 out of 2 1 8 miscalls were corrected, 
and 350 out of 458 ambiguities were correctiy resolved. There were only 46 bases that were initially ambiguous which 
were resolved incorrectly, and there were no instances of correct calls tiiat were changed to inconrect calls after RNase 

40 A treatment After the initial hybridization, only 47%, of the entire sequence was called conectly. However, when the 
hybridization results are combined witii ttie results fblkwing RNase A treatment approxiniately 87% of tiie 1 302 bases 
are called conrectiy. These results clearly demonstrate that RNase A is very effective in improving the quality of the 
sequence infbrmatk>n obtained from hybridization to oligonucleotide arrays. 

45 a. ENHANCED DISCRIMINATION USING UGATION REACTIONS 

The fbllowing examples illustrate the ability of ligation reactions to improve discrimination of base-pair mismatches 
near ttie 5* end of an oligonucleotide probe. The ligation reaction of labelled, short ollgonudeotides to ttie 5* end of 
oligonucleotide probes on a chip should occur Cm the presence of the enzyme Ligase) wherever a probe.iarget hybrid 
fi) has formed witti correct base-pairing near the 5* end of tiie probe and where there is a suitable 3' overhang of ttie target 
to serve as a template for hybridization and ligation. In ttie fbllowing examples, ttie ligation reaction is used to improve 
discrimination of base-pair mismatches near the 5* end of the probe, Ae., mismatches which are often pooriy discriminated 
fbllowing hybridization atone. 

55 Example I 

In ttiis example, a chip is made with probes having the following sequence: P-P-A-A-CGCGCCGCNC-5' wherein: 
P is a polyettilyeneglycol (PEG) spacer. A. C, and G. are the usual deoxynudeotides. and N is either A, C. G, or T. The 
chip is made using the standard VLSIPS protocols set fortti ak>ova The target oligonucleotide is a 20-mer having ttie 
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following sequence (listed 5' to 3'): 

F1 <3CGCGGCGCGAACGCAACGC 
wherein: F1 is a fluorescein molecule covaiently attached at the 5* end. The labelled, ligataisle 6-mer used in this example 
has the following sequence: 
5 F1-TGCX3TT. 

The 5' half of the 20-mer target is complementary to the probes on the chip for which N is a G. The probertarget hybrids 
for the other three probes have a single base mismatch one base in from the 5* end of the probe. The ligatable 6-mer 
is complementary to the 3' overhang of the target when the target is hybridized to the probe to form the maximum number 
of Watson-Crick hydrogen bonds. 

10 Prior to hybridization and ligation, the chip is treated with T4 Polynucleotide Kinase in order to phosphorylate the 5' 
end of the probes. The probes are phosphorylated using 100 units of T4 Polynucleotide Kinase (New England Biolabs) 
in 1 ml at 37*C for 90 minutes. 

A 10 nM solution of the target digo in 6XSSP-T (no EDTA in the hybridization buffer because EDTA could interfere 
¥vith subsequent ligation reactions) is hybridized to the chip for 30 minutes at 22*^0. The chip is scanned, and then washed 

IS with a large amount of water to remove the lat>elled target molecules. 

The ligation reaction is candied out at 16*'C in a 1 ml flow cell containing 10 nM target oligo, 20 nM ligatable 6-mer, 
and 4000 units of T4 DNA LJgase (New England Biolabs). The buffer is the buffer recommended by the manufacturer 
plus 150 nnM NaCI. The reaction is allowed to proceed Ibr 14 hours at 16''C, after which the chip is vigorously washed 
with water at 50^0 to remove the labelled target molecules. The only fluorescent label remaining after washing is that 

20 of the ligatable 6-mers that have been covaiently attached to the probes via the ligation reaction. The chip is scanned 
and analyzed, and the results compared to those ot>tained from the hybridization reaction above. 
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4.1 



In the above table. N is the base in the probe that is one position in from the 5' end {see, supra). For the target used 
3S here. G is the complementary base. HYB and LIG are the signals (fluorescence counts) for the different probes following 
hytMldization and ligation, respectively. HDF and LDF are the discrimination factors (defined as the ratio of the f luaes- 
cence signal with the perfect match, G, to the signal witti tiie specified mismatch base) following hybridization and ligation, 
respectively. 

It is dear ttiat after hybridization, the extent of target hybridization is very similar for the perfectiy complementary 
40 probe and the probes containing a mismatch near the 5' end. The A and C mismatches differ by only 10%, and the 
maximum difference is only 40%. In contrast, following the ligation reaction, tiie discrimination is greatiy inproved, with 
the rranimum discrimination fector greater than 4. These data indicate that ligation reactions can be performed on cov- 
aiently attached oligonucleotide probes on the chip surface, that these reactions are specific for correctly base-paired 
probeiarget hyt)rids. and that the reaction can be used to improve the discrimination between perfect matches and 
45 single base mismatches. 

EXAMPLE II 

In tills example, a chip was made with probes having the following sequences: 
so P-P-A-A-CX3CGCATTCN-5' (denoted CX3) 

P-P-A-A-ATATAATTCN-5' (denoted AT) 
A, T, C, G and N have tiie same definitions as those set fbrtii in Exanple I. supra. These probes contain a perfect match 
and tiie single-k)ase mismatch sequences for tiie following 22-mer target oligos (listed 5' to 3*): 
F1-GCGCGTAAGGCCTTCGACGTAG (denoted 0H1) 
55 F1-TATATTAAGGCCTTCGACGTAQ (denoted OH2) 

The 5* end of 0H1 is complementary to the CG probes witii N » C. and the 5' end of 0H2 is complementary to the AT 
probes witti N s C. Both 0H1 and 0H2 have the same 12-mer sequence at the 3* end. The labelled, ligatable G-mer 
used in ttiis example (appropriate for both OH1 and 0H2 when hybridized to the CG and AT regions of the chip, respec- 
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tivefy) has the fbllcwing sequence: 
F1*CGAAGG (denoted Lj6B). 
Prior to hybridization and ligation, the chip is phosphorylated as in Exanrple I, supra, using T4 polynucleotide Kinase 
for 4 hours at ST^C. The hybridization and ligation conditions are the same as those used In Example I unless othenwise 
5 specified. In particular. 2000 units of T4 DMA Ligase are used for the reaction here, and the concentration of the ligatable 
6-mer is 10 nM rather than 20 nM. 

The hybrids between 0H1 and the CQ probes on the chip contain a high proportion of C-G base pairs. C-G base 
pairs are known to be considerably more stable than the A*T base pairs that are predominant in the hybrid between 
0H2 and the AT probes on the chip. Thus, it is expected that 0H1 will hybridize to its perfectly complimentary probe 
10 oligo to a greater extent than will 0H2 under suitably stringent hybridization conditiona In fact this is observed to be 
the case in the hyt)rldization experiments below. The ligation reaction, however, can be used to help mitigate the com- 
plicating effects of the base composition dependence of hybridization. 

The chip was initially hybridized with both 0H1 and 0H2 at 22''C for 30 minutes. The extent of hybridization to both 
the CQ and AT regions of the chip is analyzed. It Is found that the fluorescence signal in the CG regions (0H1 hybrids) 
IS is larger than in the AT regions (0H2 hybrids) by more than a factor of 14. In fact, the perfect match signal in the CG 
region is quite strong, but the signal in the AT region is only slightly greater than twice the background. 
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25 



N 


(0H1) 


(0H2) 




HYB 


HDF 


HYB* 


HDF* 


A 


196 


2.4 


6 


5.5 




474 


1.0 


33 


1.0 


G 


159 


3.0 


20 


1.7 


T 


103 


4.6 


5 


6.6 



* These values are somewhat uncer- 
$0 tain because the signal is not large rel- 

ative to the txickground. 



Following hybrkJization. the chip was washed extensively with water to remove the target molecules. A ligation 
35 reaction is initiated on the chip by combining 0H1 . 0H2. and L6B in 1 ml of ligation buffer and adding 2000 units of T4 
DNA Ligase. The reaction is alfowed to proceed for 34 hours at 22^C, and then for another 24 hours at S^'C. At each 
stage, the chip is read and the data recorded and analyzed. 
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50 



N 


34 hrs..T = 22«'C 


24 hrs..T = 8^C 




(0H1) 


(0H2) 


(0H1) 


(0H2) 




LIG 


LDF 


LIG 


LDF 


LIG 


LDF 


LIG 


LDF 


A 


18 


56 


3 


31 


27 


46 


10 


88 


C 


1003 


1.0 


92 


1.0 


1234 


1.0 


879 


1.0 


G 


13 


44 


23 


13 


24 


51 


30 


29 


T 


15 


67 


3 


31 


22 


56 


8 


110 



It is striking that after the ligation reaction at 8''C. the signals for 0H1 and 0H2 differ by only a factor of 1.4. ten 
55 times less than the factor of 1 4 that was observed fblfowing the original hybridization. It is even more striking that the 
composition dependence is mitigated by virtue of the ligation reaction at low temperature with no loss of discrimination 
for either 0H1 or 0H2. 
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In order for the ligation strategy to be useful for unknown or more complex DNA targets, it is necessary to use a 
pool of all possible (4096) 6-mers instead of a specific Itgatable 6-mer. The 4096 6-mers are synthesized using standard 
5 phosphoramidite chemical procedures on four separate columns, one beginning (at the 3' end) with A, one with C. one 
with G. and one with T. Each of the 5 subsequent synthesis steps are performed using a mixture of A. C. G, and T 
phosphoramidite. producing a mixture of all possible five base sequences on each of the four columns. The G-mers are 
labelled with fluorescein at the 5* end as the last step in the synthesis. After reversed-phase HPLC purification of the 
four 6-mer pools, the concentration of each pool is determined by the absorption at 260 nm. The appropriate amounts 
70 of each pool is mixed to make a solution that contains all 4096 labelled 6-mer oligonucleotides. 
A chip is made containing lO-mer probes having the following sequences 
P-P-C-G-C-G-Ni -N2-N3-N4-N5-N6-y 
wherein: Nj are A. C, G. or T. In other words, the chip contains lO-mers with all possible (4096) six base contbinations 
at the 5* end. The 5' phosphate group on the probes required for ligation Is added chemically (using 5* PhosphateON. 
IS Clontech Laboratories. Palo Alto. CA) as the last step In the synthesis of the chip, prior to deprotection of the bases. 
The target digo is a 22-mer having the following sequence (listed 5' to 3*): 
F1-GCGCGTAAGGCCTTCGACGTAG (0H1) 

The chip was initially hybridized with 1 0 nM 0H1 in 6XSSP-T at 22*^0 for 30 minutes. The chip is read and analyzed. 
The only perfect match probe for this target (/.e.. PP-CXaCGCATTCC-5') has the second highest hybridization signal. 

20 Eight other probes have hybridization signals that are within a factor of 4 of ttie perfect match signal. The other three 
probes with a single base mismatch at the 5* end have disaimination factors of 2.0. 2.6, and 3.5. for G, A. and T, respec- 
tively. Otiier single base mismatches at positions in from the 5' end of the probe give signals ttiat are oonsklerably smaller. 
The chip is washed with water to remove the hybridized target 

The chip is next hybridized using tiie conditions used lor the ligation reaction. The chip is hybridized witii 10 nM 

25 OH 1 and 1 .6 ^ 6-mer pool (0.4 nM for each 6-mer oligo) in the ligation buffer for 1 1 hours at 22<'C (no ligase at this 
stage). The perfect match probe gives the highest signal by a factor of 2.4. Five probes have signals witiiin a factor of 
4 of the perfect match signal. The other three probes with a single base mismateh at the 5* end have discrimination 
factors of 3.0. 3.6. and 8.0. for G, A. and J, respectively. 

The ligation reaction is initiated by the addition of 2000 units of T4 DNA ligase to ttie solution containing 0H1 and 

30 the pool of 6-mers. The reaction is allowed to proceed for 23 hours at 22''C. After washing the chip with water at about 
45'*C for five minutes, the chip is read. After ligation, no otiier probes have hybridization signals that are within a factor 
of 4 of the perfect match signal. The tiiree 5* single base mismatch probes all have discrimination factors greater than 
12. Thus, with a complex chip containing 4096 probes with all possitsle 6-mer sequences at the 5* end, and using a pool 
of all possOsle ligataUe 6-mers, tiie ligation reaction is still specific for the perfectiy complementary probe and affords 

35 considerable increases in the discrimination between perfect matches and single-base mismatches. 

EXAMPLE !V 

In this example, a chip was made using the tiling strategy (A. C, G. T -containing probes for each base in tiie 
40 sequence) described above tiiat covers a 50 base region of tiie protease gene of HIV-1 (SF2 strain). The probes are 
1 1 -mers, linked to the glass support by three PEG linkers. The substitution position (the position being interrogated by 
an A, C, G, or T base in the probe) is varied l)etween the 5* end of the probe, and five bases in from the 5' end (referred 
to as positions end. -1 . -2. -3. -4 and -5). The chip Is synthesized using standard VLSIPS protocols. Prior to hytxidization 
and ligation, the chip is phosphorylated using T4 polynucleotide kinase for 5 hours at 37''C. The target is a 75-mer 
45 oligonucleotide (denoted Hprol). labelled at the 5* end with fluorescein, that spans the corrplementary 50 base region 
on the chip. 

The chip was initially hybridized witti a 1 0 nM solution of Hprol in 6XSSP-T at 22''Cfor 30 minutes. After hybridization, 
the chip was read, and then rinsed witii wat^ to remove tiie target molecules. A ligation reaction was then carried out 
with 10 nM Hprol , 1 .6 ^M 6-mer pool (0.4 nM per oligo), and 2000 units of T4 DNA Ugase in 1 ml of ligation buffer. The 

50 ligation reaction is allowed to proceed for 25 hours at 8''C. then 90 hours at 22''C, and finally 4 days at S^'C. At intervals 
of 1 to 2 days, the solution is supplemented with additional T4 DNA Ugase. Following the ligation reaction, the chip is 
washed vigorously witii water at about 45*'C for 10 minutes, leaving only the labelled 6-mers that have been ligated to 
the probe molecules. The chip is read, and tiie data analyzed. 

The results of the hybridization and ligation reactions are analyzed in terms of the atxiity to make a conrect base 

55 call from the fluorescence signal measured on the chip. In particular, the signal is compared between the four probes 
that differ by a single base at a given position within the 1 1 -mer. with tiie rest of the 1 1 -mer being perfectly complementary 
to a specific region of tiie target sequence. For the purposes of tills experiment a base kientif ication is said to be made 
if the signal in at least one of the four probe regions is greater than the signal in a nearby region that has no oligonudeotid 
probes (the background) by at least 5 counts (the bacKground counts are usually about 2 - 6 counts), and if the signal 
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in one of the four regions is greater than that in the other three related regions by at least a factor of 1 .2. If none of the 
four signals are larger than the other three by a factor of at least 1 .2. a multiple base ambiguity results. If the most intense 
hyt>ridization signal (by a factor of at least 1 .2) is for a probe that is not perfectly complementary to the target sequence, 
then a miscall results. 

5 Following hybridization, the 11 -mer probes with substitution positions -1.-2. -3, and -4 all gave 49 correct base calls 
and 1 multiple base ambiguity. The probe with substitution position -5 resulted in 50 con-ect base calls. Following ligation, 
the probes with substitution positions -2 and -5 gave 48 correct calls and 2 miscalls, substitution position -3 yielded 46 
correct calls and 1 ambiguity and 1 miscall, and substitution position -1 and -4 both yielded 50 correct calls with no 
ambiguities or miscalte. These results Indicate that the ligation reaction with tiie full pool of 6-mers can be used to 

10 specifically label hybrids between relatively complex targets and arrays of oligonucleotide probes. 

It is interesting to note that the pattern of ligation (stronger or weaker signals, better or worse cfiscrimination) is not 
in general the same as tiie pattern of hybricfization. This suggests that these two approaches may be used as comple- 
mentary tools to obtain sequence information with arrays of oligonucleotide probes. For example, probes that produce 
large hybridization signals, but are poorly disoiminated may be better treated using a ligation step. And probes that do 

IS not hybridize well to a particular complementary target (leading to a signal that is too small relative to the background) 
nr»y ligate well enough to be clearly detected (as also suggested by the mitigation of the base composition dependence 
denx)nstrated in Example II. sup/a). 

C. PREPARATION OF UNIMOLECULAR, DOUBLE-STRANDED OLIGONUCLEOTIDES 

20 

PCAMPL^I 

This example illustrates the general synthesis of an anray of unimolecular, double-stranded oligonucleotides on a 
solid support. 

25 Unimolecular dout>le stranded DNA molecules were synthesized on a solid support using standard light-directed 
methods (VLSIPS^ protocols). Two hexaethylene glycol (PEG) linkers were used to oovalently attach the syntiiesized 
oligonudeotkles to the derivatized glass surface. Syrrthesls of the first (inner) strand proceeded one nucleotide at a time 
using repeated cycles of photo-deprotection and chemical coupling of protected nucleotides. The nucleotides each had 
a protecting group on the base portion of tiie monomer as well as a photolabile MeNPoc protecting group on the 5' 

30 hydroxy!. Upon completion of the inner strand, anottier MeNPoc-protected PEG linker was ccvalentiy attached to the 5' 
end of the surface-bound oligonucleotide. After addition of tiie intemal PEG linker, the PEG is photodeprotected. and 
the synthesis of the second strand proceeded in the normal fashk>n. Following the synthesis cycles, the DNA bases 
were deprotected using starxiard protocols. The sequence of the secorxJ (outer) strand, being complementary to that 
of tiie inner strand, provided nx)lecules with short, hydrogen txxxJed, unimolecular dout)le-stranded structure as a result 

35 of the presence of the internal flexible PEG linker. 

An array of 16 different molecules were synthesized on a derivatized glass slide in order to determine whether short, 
unimolecular DNA structures could be formed on a surface and whether they could adopt structures that are recognized 
by proteins. Each of the 1 6 different molecular species occupies a different ptiystcal region on the glass surface so that 
there is a one-to-one con'espondence between molecular identity and physical location. The molecules are of ttie form 

40 s-P-P-C-C-An'-An'-An"-An"-G-C-P-G-C-A/T-An'-An"-An'<3-G-F where S is the solid surface having silyl groups. P is a 
PEG linker, A. C. G, and T are tiie DNA nucleotides, and F Is a fluorescent tag. The DNA sequence is listed from tiie 3* 
to the 5' end (ttie 3' end of tiie DNA molecule is attached to tiie solid surface via a silyl group and 2 PEG linkers). The 
sixteen nwiecules synthesized on ttie solid support differed in ttie various permutations of A and T in the at)0ve fbrmula. 

45 EXAMPLE II 

This example illustrates the ability of a library of surface-bound, unimolecular. double-stranded oligonucleotides to 
exist in duplex form and to be recognized and bound k>y a protein. 

A library of 16 different members was prepared as described in Example 1 . The 16 molecules all have the same 

so composition (same number of As, Cs. Gs and Ts), but ttie order is different Four of ttie molecules have an outer strand 
that is 100% complementary to ttie inner strand (these molecules will be referred to as DS. double-stranded, below). 
One of the four DS oligonucleotides has a sequence that is recognized by the restriction enzyme EcoRI . If the molecule 
can loop back and form a DNA duplex, it should be recognized and cut by the restriction enzyme, thereby releasing tiie 
fluorescent tag. Thus, the action of the enzyme provided a functional test for DNA structure, and also served to dem- 

55 onstrate ttiat these structures can be recognized at the surface by proteins. The remaining 12 molecules had outer 
strands ttiat were not complementary to their inner strands (referred to as SS. single-stranded, below). Of ttiese. ttiree 
had an outer strand and ttiree had an inner strand whose sequence was an EcoRI half-site (tiie sequence on one strand 
was correct for the enzyme, but the other half was not). The solid support with an array of molecules on th surface is 
referred to as a *ch'p" for the purposes of the following discussion. The presence of fluorescentty lat)elied molecules on 
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the chip was detected using confocal fluorescence microscopy. The action of various enzymes was determined by mon- 
itoring the change in the anrx>unt of fluorescence from the molecules on the chip surface (e.g. "reading" the chip) upon 
treatment with enzymes that can cut the DNA and release the fluorescent tag at the 5' end. 

The three different enzymes used to characterize the structure of the molecules on the chip were: 

5 

1) Mung Bean Nuclease - sequence independent single-strand specific DNA endonuclease: 

2) DNase I - sequence independent double-strand specific endonudease; 

3) EcoR1 - restriction endonuclease that recognizes the sequence (5-3*) 

10 GAATTC in double stranded DNA, and cuts between the G and the first A. Mung Bean Nuclease and E00RI were 

obtained from New England Bidabs, and DNase I was obtained from Boehringer Mannhein). All enzymes were used at 
a oonoentration of 200 units per mL in the buffer recommended by the manufacturer. The enzymatic reactions were 
perfbnmed in a 1 mL flow cell at 22*'C, and were typically allowed to proceed for 90 minutes. 

Upon treatment of the chip with the enzyme EcoR1 , the fluorescence signal in the DS EcoRI region and the 3 SS 

75 regions with the EcoR1 half-site on the outer strand was reduced by about 10% of its initial value. This reduction was 
at least 5 times greater than for the other regions of the chip, indicating that the action of the enzyme Is sequence specific 
on the chip. It was not possible to determine if the factor is greater than 5 in these preliminary experiments because of 
uncertainty in the constancy of the fluorescence background. However, because the purpose of these early experiments 
was to determine whether unimolecular dout>le-stranded structures could be formed and whether they could be specif- 

20 ically recognized by proteins (and not to provide a quantitative measure of enzyme specificity), qualitative differences 
between the different synthesis regions were sufficient. 

The reduction In signal in the 3 SS regions with the E00RI half-site on the outer strand indicated either that the 
enzyme cuts single-stranded DNA with a particular sequence, or that these moleoil^ formed a double-stranded struc- 
ture that was recognized by ttie enzyme. The molecules on the chip surface were at a relatively Ngh density, with an 

^ average spadng of approximately 100 angstroms. Thus, it was possible for the outer strand of one nrK)lecule to form a 
double-stranded structure with the outer strand of a neighboring molecule. In the ease of the 3 SS regions with the 
EcoRI half-site on the outer strand, such a btmolecular double-stranded region would have the correct sequence and 
structure to be recognized by EcoRI . However, it would differ from the unimolecular dout)le-stranded molecules in that 
the inner strand remains single-stranded and thus amenable to deavage by a single-strand specific endonudease such 

30 as Mung Bean Nuclease. Therefore, it was possible to distinguish unimolecular from binfX)lecular double-stranded DNA 
molecules on the surface by their ability to be cut by single and double-strand specific endonudeases. 

In order to remove all nwlecules that have single-stranded structures and to identify uninfx>lecular double-stranded 
molecules, the chip was first exhaustively treated with Mung Bean Nuclease. The reduction in the fluorescence signal 
was greater by at)out a factor of 2 for the SS regions of the chip^ including those with the EcoRI half-site on the outer 

35 strand that were cleaved by EcoRI , than for the 4 DS regions. Following Mung Bean Nuclease treatment, the chip was 
treated with either DNase I (which cuts all remaining double-stranded molecules) or E00RI (which should cut only the 
remaining double-stranded molecules with the correct sequence). Upon treatment with DNase I. the fluorescence signal 
in the 4 DSjegions was reduced by at least 5-fold more than the signal in the SS regions. Upon EcoRI treatment, the 
signal in the single DS region with the correct EcoRI sequence was reduced by at least a factor of 3 nrK>re than the 

40 signal in any other region on the chip. Taken together, these results indicated that the surface-bound molecules synthe- 
sized with two complementary strands separated by a flexible PEQ linker form intramolecular double-stranded structures 
that were resistant to a single-strand specific endonudease and were recognized by both a dout)le-strand specif k: endo- 
nudease. and a sequence-specific restriction enzyme. 

45 EXAMPLE ill 

This example illustrates the strategy employed for the preparation of a conformationally restricted hexapeptkJe. 

A glass coverslip having aminopropylsilane spacer groups can t>e further derivatized on the amino groups with a 
poly-A digonucleotide comprising nine adenosine monomers using VLSIPS^ ("light-directed*^ methods. The tenth ade- 
50 nine nfionomer to be added will be a 5'-aminopropyl-functionalized phosphoramkjite (available from Gen Research or 
Genosys Biotechnologies). To the amine terminus is then added, in stepwise fashion, the hexap^'de. RQFKWT, t)egin- 
ning with the caiboxyl end of the peptkJe (i.e.. as T-V-V-K-F-Q-R). A 3'-sucdnylated nucleoskJe can then be added under 
peptide coupling conditions and the nudeotkie synthesis of the pdy-T tail can be continued to provide a confbnnationally 
restricted probe. 

55 It is to be understood that the sime description is intended to be illustrative and not restrictive. Many embodiments 
will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should, 
therefore, be determined not with reference to the above desaiption. but shouki instead be determined witii reference 
to the appended dalms. along with the full scope of equivalents to which such claims are entitied. 
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XV. Conclusion 

The present invention provides greatiy improved methods and apparatus for the study of nucleotide sequences and 
nudeic acid interactions with other nnHecuIes. It is to be understood that tiie above description is intended to be illustrative 
and not restrictive. Many embodiments and variations of the invention will become apparent to those of skill in the art 
upon review of tin^ disclosure. Merely by way of example, certain of ttie embodiments descrbed herein will be applicable 
to other polymers, such as peptides and proteins, and can utilized other synthesis techniques. The scope of the Invention 
should, therefore, be determined not with reference to the above description, but instead should be determined witti 
reference to tiie appended claims along with the full scope of equivalents to which such claims are entitied. 



SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Affymax Technologies, N.V* 

(B) STREET: De Reyderkade 62 

(C) CITY: Curacao 

(D) STATE: 

(E) COUNTRY: NetLherlands Antilles 

(F) POSTAL CODE (ZIP): 

(G) TELEPHONE: . ^ 

(H) TELEFAX: 

(I) TELEX: 

(ii) TITLE OF INVENTION: Methods of Enzymatic Discrimination 
Enhancement and Surface-Bound Double-Stranded DNA 

(iii) NUMBER OF SEQUENCES: 42 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Hepworth Lawrence Bryer & Biz ley 

(B) STREET: Merlin House, Falconry Court, Baker's Lane, 

(C) TOWN: Epping 

(D) COUNTY: Essex 

(E) COUNTRY: UK 

(F) POST CODE: CM16 5DQ 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: EP 95 307501.7 

(B) FILING DATE: 20-OCT-1995 

(C) CLASSIFICATION: 
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IS 
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(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/327,522 

(B) FILING DATE: 21-OCT-1994 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/327,687 

(B) FILING DATE: 24-OCT-1994 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/533,582 

(B) FILING DATE: 18-OCT-1995 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Richard Edward Biz ley 

(B) REFERENCE/DOCKET NUMBER: APEP95996 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: +44 1992 561756 

(B) TELEFAX: +44 1992 561934 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 
-> (A) LENGTH: 12 base pairs - 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

30 (Xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

AGCCTAGCTG AA 12 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
^ TTCAGCTAGG CT 12 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 base pairs 
^ (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



55 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TTTTTAAAAA 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
AAAAATTTTT 



(2) INFORMATION FOR SEQ ID N0:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
AAAGAAAAAA 6ACAGTACTA AATGGA 



(2) INFORMATION FOR SEQ ID N0:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AGTACTGTNT TTTTT 



(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TAGTACT6NC TTTTT 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
TTA6TACTN6 CTTTT 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 
• (B) TYPE: nucleic acid * 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CTGTATCCGA CATCTGGTTA A 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CCAACCAAAC CCC 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: XI: 



5 



CCAACCAAAH NHN 



13 



(2) INFORMATION FOR SEQ ID NO: 12: 



10 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



IS 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



ACTGTTAGCT AATTGG 



16 



20 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GGGGGGAGCT AACGGG 16 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
- (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
TACTGTATTT TTT 13 



45 



(2) INFORMATION FOR SEQ ID NO: 15: 



SO 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



ss 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
TACTGTCTTT TTT 



(2) INFORMATION FOR SEQ ID NO: 16: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
TACTGTGTTT TTT 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
TACTGTTTTT TTT 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
GTACTGACTT TTT 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GTACTGCCTT TTT 



(2) INFOKKATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
6TACTGGCTT TTT 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
GTACTGTCTT TTT 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
AGTACTATCT TTT 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
AGTACTCTCT TTT 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
AGTACTGTCT TTT 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
AGTACTTTCT TTT 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
GGGNCCCTTA A 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
TAAA6TAA6A CATAAC 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOIiOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
GGCTGACGTC AGCAAT 



(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 15 base pairs 
' (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
TTGCTGACAT CAGCC 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TTGCTGACCT CAGCC 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TTGCTGACGT CA6CC 15 

(2) INFORMATION FOR SEQ ID NO: 32: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TTGCTGACTT CAGCC 15 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /mod_base= OTHER 

/note» *'N s adenine covalently modified 
at the 3' hydroxyl group with 2 
polyethylene glycol (PEG) spacers" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CNCGCCGCGC AN 12 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base» OTHER 

/note= **N a guanine covalently modified 



49 



EP0 721 016 A2 



at the 5' hydroTctX qvavtp wittt a 
fluorescein molecule" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
NCGCGGCGCG AACGCAACGC 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
IS (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
20 (A) NAME/KEY: modi£ied_base 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /mod base»= OTHER 

/note- "N s adenine covalently modified 
at the 3' hydroxy 1 group with 2 
polyethylene glycol (PEG) spacers** 

25 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
NCTTACGCGC AN 12 



^ (2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 
40 (A) NAME/KEY: modif ied_base 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /mod_bases OTHER 

/note= "N « adenine covalently modified 
at the 3' hydroxyl group with 2 
polyethylene glycol (PEG) spacers" 

45 

(xi) SEQXmNCE DESCRIPTION: SEQ ID NO: 36: 
NCTTAATATA AN 12 



^ (2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

55 
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(C) STRANDEDNCSS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modified base 

(B) LOCATION: 1 ^ 

(D) OTHER INFORMATION: /inod_base= OTHER 

/note» » guanine covalently modified 
at the 5' hydroxy 1 group with a 
fluorescein molecule" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
NCGCGTAAGG CCTTCGAC6T AG 22 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod^base- OTHER 

/note= "N = thymine covalently modified 
at the 5' hydroxy 1 group with a 
fluorescein molecule** 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
NATATTAAGG CCTTCGACGT AG 22 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: modified base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note» "N a cytosine covalently modified 
at the 3' hydroxy 1 group with 2 
polyethylene glycol (PEG) spacers*" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
NNNNNNGCGN ^® 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base= OTHER ^ 

/note« "N = cytosine covalently moaifxea 
at the 3' hydroxyl group with 2 
polyethylene glycol (PEG) spacers" 

(xi) SEQUENCE DESCRIPTION: SEQ- ID NO: 40: 
CCTTACGCGN 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

Arg Gin Phe Lys Val Val Thr 
1 5 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Thr Val Val Lys Phe Gin Arg 
1 5 
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1 . A method for sequencing a target nucleic acid, said method comprising: 

(a) confining: 

(i) a substrate comprising an anray of chemically synthesized and positionaliy distinguishable oligonucle- 
otides each of which is complementary to a defined subsequence of preselected length; and 

(ii) a target nucleic acid; thereby forming target-oligonudeotide hybrid complexes of complementary sub- 
sequences of known sequence; 

(b) contacting said target-ofigonudeotide hybrid complexes with a nuclease; thereby remcving target-ofigonu- 

cleotide complexes which are not perfectly complementary: and 

(c) determining which of said oligonucleotides have specifically interacted with subsequences in said target 
nucleic add, to determine the sequence of said target nudeic add. 

2. The method as recited in daim 1 wherein said target nudeic acid is ribonudeic add (RNA), optionally said nudease 
is an RNA nudease, preferably RNase A. 

3. A method for sequendng a target nucleic acid, said method comprising: 

(a) combining: 

(i) a substrate conrprising an an-ay of chemically synthesized and positionaliy distinguishable oligonude- 
otides each of which is complementary to a defined sut>sequence of preselected length; and 

(ii) a target nucleic acid which is longer than each of said probes; thereby forming target-digonudeotide 
hybrid complexes of complementary subsequences of known sequence with a 3' target overhang; 

(b) contacting said target-oligonudeotide hybrid complexes with a ligase and a labelled, ligatable oligonudeotide 
probe; 

(c) removing unbound target nudeic acid and labelled, unligated oligonucleotide probes; and 

((Q determining which of said oligonudeotides contain said labelled, ligatable oligonudeotide probe as an indi- 
catk)n of a subsequence which is complementary to a subsequence of said target nudeic acid. 

4. The method as recited in claim 1 or claim 3 wherein said target nudeic add is deoxyribonudelc add (DNA). 

5. The method as recited in daim 4 when dependent on daim 1 wherein said nuclease is a DNA nudease, preferably 
DNA nudease SI nudease or Mung Bean nudeasa 

6. The method as redted in any preceding daim wherein said anray of oligonucleotides recognizes substantially all 
possible sut>sequences of preselected length fbund in said target nudeic add. - 

7. The method as recited in any preceding claim, wherein each oligonudeotide is of a length between about 6 and 20 
bases, preferat)ly between about 8 and 15 bases. 

8. The method as recited in any preceding daim. wherein said array of oligonucleotides comprises about 1 ,000 different 
oligonudeotides. preferably about 3.000 different oligonucleotides, preferably about 10^ different oligonucleotides, 
more preferably atxxjt 10^ different oligonudeotides, even more preferably about 10® different oligonudeotides. 

9. The method as recited in any one of daims 3, 4 or 6 to 8 wherein said ligase is a member selected from the group 
consisting of T4 DNA ligase. ligases isolated from E. coli and ligases isolated from bacteriophages. 

10. A method fbr sequendng an unlabeled target oligonucleotide, said method comprising: 

(a) comkM'ning: 

(i) a sut>strate comprising an array of positionaliy distinguishable oligonucleotide probes each of which has 
a constant region and a variable region, said variable region capable of binding to a defined 8uk>sequence 
of preseleded length; 
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(ii) a constant oligonucleotide having a sequence which is complementary to said constant region of said 
oligonucleotide probes: 

(iii) a target oligonucleotide to be sequenced; and 

(iv) a ligase, thereby forming target-oligonudeotide hybrid complexes of complementary subsequences of 
krxwn sequence; 

(b) contacting said target oiigonucleotide-oligomicleotide probe hybrid complexes with a ligase and a pod of 
labelled, ligatable oligonudeotide probes of a preselected length, said pool of labelled, ligatable oligonudeotide 
probes representing all possible sequences of said preselected length; 

(c) removing unbound target nucleic acid and labelled, unligated oligonudeotide probes; and 

(d) determining which of said oligonucleotide probes contain said labelled, ligatable digonudeotide probe as 
an indication of a subsequence which is complementary to a subsequence of said target oligonudeotida 

11. A metiiod for sequencing an unlabelled target oligonudeotide. said method comprising; 

(a) contacting an unlabelled target oligonudeotide with a library of labelled oligonucleotide probes, each of said 
oligonucleotide probes having a known sequence and being attached to a solid support at a known position, to 
hybridize said target oligonucleotide to at least one member of said library of probes, ttiereby forming a hybrid- 
ized library; 

(b) contading said hybridized library with a nudease capable of deaving double-stranded oligonudeotides to 
release from said hybridized lil>rary a portion of said labelled oligonudeotide probes or fragments thereof; and 

(c) identifying said positions of said hybridized Ibrary from which labelled probes or fragments thereof have 
been removed, to determine tiie sequence of said unlabelled targe! oligonudeotida 

12. A synthetic unimolecular. double-stranded oligonudeotide library comprising a plurality of different members, each 
member having the formula; 

Y-L^-X^-L2-X2 

wherein. 

Y is a sdid support; 

and X^ are a pair of complementary oligonudeotides 
U is a spacer; 

is a linking group having sufficient length such that X'' and X^ form a double-stranded oligonucleotide. 

13. A library in accordance with claim 12. wherein is a member selected from the group consisting of an alkyiene 
group, a polyethyleneglycol group, a pdyalcohol group a polymlne group and a polyester group. 

14. A library in accordance with claim 12 or daim 13, wherein X^ and X^ are complementary oligonudeotides each 
comprising of from 6 to 30 nudeic add monomers. 

1 5. A library in accordance with any one of daims 1 2 to 1 4, wherein said solid support is a silica support and comprises 
an aminoalkylsilane and from 1 to 4 hexaethyleneglycda 

16. A synthetic unimolecular, double-stranded oligonudeotide library of any one of daims 12 to 15. wherein a portion 
of said double-stranded oligonudeotides formed tiy X^ and X^ further oonrprise a bulge or a kx>p. 

1 7. A synthetic unimolecular. double-stranded nudeic acid library of any one of claims 12 to 16. wherein each member 
further comprises an identifier tag, said identifier tag identifying tiie sequence of said unimolecular. double-stranded 
nudeic add. 

18. A synthetic unimolecular, dout)le-stranded nucleic add library of any one of daims 12 to 17. wherein said sdid 
support comprises a first bead linked to a second bead, wherein tiie doUble-stranded nuciek; add is attached to the 
first bead and an kJentif ier tag is attached to ttie second bead. 

1 9. A method of forming a plurality of diverse unimolecular. double-stranded oligonucleotides on a solkJ support having 
optional spacers, said support comprising a surface witii a plurality of preselected regions. sakJ method conrprising: 
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(a) forming on each of said preselected regions a different first oligonucleotide, each of said first oligonucleotides 
comprising of from 6 to 30 monomers; 

(b) attaching to the distal end of each of said first oligonucleotides of step 
(a) a linking group: and 

(c) forming on the distal end of each of said linking groups a second oligonucleotide, wherein each of said 
second oligonucleotides is complementary to said first oligonucleotide which is attached within the same prese- 
lected region, and wherein sakl linking groups have sufficient length such that said first and second oligonucle- 
otides form a unimolecular, dout)le-8tranded oligonucleotide. 

20. A method of screening a sample for a species capable of binding to double-stranded DNA confiprising: 

contacting said sample with a solid support comprising unimolecular, double-stranded DNA attached thereon, 
each of said attached DNA independently having the formula; 

.X^i-L-X^2 

wherein, 

X^^ and X^^ are complementary oligonucleotides; and 

L is a linking group having sufficient length such that X'*'' and X^^ form said attached unimolecular, double- 
stranded DNA, to produce at least one bound pair comprising said species and one of said attached unimolecular, 
double-stranded DNA; and 

kientifying sakj bound pair. 

21. A method in accordance with claim 20, wherein sakJ species is a member selected from the group consisting of a 
drug, a protein and an RNA molecula 

22. A method of screening a sample for a species capat)le of binding to double-stranded DNA comprising: 

contacting said sample with a solid support comprising a unimolecular, double-stranded DNA attached ther- 
eon, sakJ attached DfsIA having the formula; 

-Xi^.L-Xi2 

wherein, 

X^^ and X^^ are complementary oligonudeotkles; and 

L is a linking group having sufficient length such that X^ ^ and X^^ form sakJ attached unimolecular, double- 
stranded DNA. to produce a bound pair comprising said species and said attached unimolecular, double-stranded 
DNA; and 

identifying said bound pair. 

23. A synthetic confbrmationally-restricted probe library comprising a plurality of members, each of sakJ members com- 
prising a sdKl support attached to an oligomer having the formula: 

.X^^-Z-X^2 

wherein, 

X^^ and X""^ are complementary oligonucleotides; and 

Z is a probe having sufficient length such that X^ ^ and X^^ fbrm a double-stranded portion of sakl member 
and thereby restrict the conformations available to sakJ probe. 

24. A synthetic library in accordance with daim 23. wherein each of sakJ probes is a peptide having of from about 4 to 
about 12 amino acds and optionally each member further comprises an intercalating dye. 

25. A method of synthesizing a library of confbrmationally-restricted probes on a solkJ support having optional spacers, 
sakJ support comprising a surfece with a plurality of preselected regions, saki method comprising: 

(a) forming on each of said preselected regions a first oligonudeotkfe, each of said first oligonudeotkles com- 
prising of from 6 to 30 monomers; 

(b) attaching to the distal end of each of said first oligonudeotides of step 
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(a) a probe; and 

(c) forming on the distal end of each of said probes a second oligonucleotide, wherein each of said second 
oligonucleotides is complementary to said first oligonucleotide which is attached within the same preselected 
region, and wherein said probes have sufficient length such that said first and second oligonucleotides form a 
unimolecular. double-stranded oligonucleotide thereby conformationally-restricting said probes. 

26. A method in accordance with daim 1 9 or daim 25, wherein said method of construction of step (a) and step (b) is 
by light-directed synthesis. 

27. A method of screening a sanrple for a species capable d binding to a oonformationaily-restricted probe comprising: 

contacting said sample with a solid support comprising oonformationaily-restricted probes attached thereon, 
each of said attached probes independently having the fonnula; 



X"*^ and X^^ are complementary oligonucleotides: 
and Z is a probe having sufficient length such that X^"* and X^^ form a double-stranded oligonucleotide portion of 
said conformationally-restricted probe, to produce at least one bound pair comprising said species and one of said 
attached conformationalty-restricted probes; and 
identifying said bound pair. 

28. An adhesive for use in biological applications comprising a first surface having a plurality of attached oligonucleotides 
and a second surface having a plurality of attached oligonucleotides, wherein the oligonucleotides of said first surface 
are substantially complementary to the oligonucleotides of said second surface. 

29. A synthetic intermolecular, doubly-anchored, double-stranded oligonucleotide library comprising a plurality of dif- 
ferent members, each member having tiie formula: 



wherein. 

Y is a solid support; 

X^ and X^ are a pair of complementary oligonucleotides; 
U and are eadi independentiy a bond or a spacer. 

30. A library in accordance with claim 29. wherein 0 and each independentiy comprise a member selected from the 
group consisting of an alkylene group, a pdyetiiyleneglycol group, a polyalcohol group, a polyamine group and a 
polyester group, preferably and each independently comprise a polyethylene glycol group. 

31. A library in accordance witii claim 29 or daim 30, wherein X^ and X^ are complementary oligonudeotides each 
comprising of from 6 to 30 nudeic acid monomers. 

32. A}i1>raryinaccordancewithanyoneofdaims29to31.whereinsaidsolidsupportisasilica supported U comprises 
an aminoalkylsilane and from 1 to 4 hexaethyleneglycols. 

33. A mettiod of preparing a single-stranded nucleic acid sequence, said method comprising: 

(a) forming a hybrid complex by combining at least two oligonucleotides which are phosphorylated at their 5* 
ends with a chip-bound oligonudeotide. said chip-bound oligonudeotide having subsequences which are com- 
plementary to a subsequence of each of said oligonudeotide; 



-X^i.Z.X^2 



wherein. 
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(b) contacting said hybrid complex with a ligase to form a ligated digonucleatide; and 

(c) releasing said ligated ofigonudeotide from said chip-bound oligonudeotide to form a single-stranded nucleic 
acid sequence 
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